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About this Individual Study Option 



In this Individual Study Option the concepts of classical and quantum information theory are presented, with 
some real-life applications, such as data compression, information transmission over noisy channels and quantum 
cryptography. 

Abstract 

In each chapter the following topics are covered: 

1. The relation of information theory with physics is studied and by using the notion of Shannon's entropy, 

measures of information are introduced. Then their basic properties are proven. Moreover important 
information relevant features of quantum physics, like the disability of distinguishing or cloning states 
in general arc studied. In order to get a quantum measure of information, an introduction to quantum 
thermodynamics is given with a special focus on the explanation of the utility of density matrix. After this 
von Neumann entropy and its related measures are defined. In this context a major discrepancy between 
classical and quantum information theory is presented: quantum entanglement. The basic properties of 
von Neumann entropy are proven and some information theoretic interpretation of quantum measurement 
is given. 

2. The amount of accessible information is obtained by the Fano's inequality in the classical case, and by its 
quantum analogue and Holevo's bound in the quantum case. In addition to this classical and quantum 
data processing is discussed. 

3. Furthermore for roal-lifc applications data compression is studied via Shannon's classical and Schumacher's 
quantum noiseless channel coding theorems. Another application is transmission of classical information 
over noisy channels. For this a summary of classical and quantum error correction is given, and then 
Shannon's classical noisy channel and Holevo-Schumacher- Westmoreland quantum noisy channel coding 
theorems are studied. The present state of transmission of quantum information over quantum channels 
is summarized. 

4. A practical application of the aforementioned is presented: quantum cryptography. In this context the 
BB84, a quantum key distribution protocol, is demonstrated and its security is discussed. The current 
experimental status of quantum key distribution is summarized and the possibility of a commercial device 
realizing quantum cryptography is presented. 

Special viewpoints emphasized 

The above material is in close connection with the last two chapters of [1, chap. 11, 12], and error correction is a 
summary of the results given in chapter 10 of [1]. From this book Figures 1.1, 3.1, 3.2 and S.l, were extracted. 
However there was much influence on information theory by [2] , and some results given therein like for example 
equation (3.1) were extended to (3.2). Of equal impact was [3] concerning error correction, and [3, 4] concerning 
quantum cryptography. Note that Figure was 4.2 extracted from [4]. Of course some notions which were vital 
to this Individual Study Option were taken from almost all the chapters of [1]. 

However there where some topics not well explained or sometimes not enough emphasized. The most 
important tool of quantum information theory, the density matrix, is misleadingly given in [1, p. 98]: "the 
density matrix [...] is mathematically equivalent to the state vector approach, but it provides a much more 
convenient language for thinking about commonly encountered scenarios in quantum mechanics". The density 
matrix is not equivalent to the state vector approach and is much more than just a convenient language. The 
former is a tool for quantum thermodynamics, a field where is impossible to use the latter. Rephrasing it. 
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quantum thermodynamics is the only possible language. Willing to augment understanding of this meaningful 
tool, subsection 1.2.2 is devoted to it, and is close to the viewpoint of [5, p. 295-307], which is a classical textbook 
for quantum mechanics. 

In this Individual Study Option some topics are presented in a different perspective. There is an attempt to 
emphasize information theoretic discrepancies between classical and quantum information theory, the greatest of 
which is perhaps quantum entanglement. A very special demonstration of this difference is given in subsection 
1.2.4. It should be noted that some particular information theoretic meaning of measurement in physics is 
presented in subsection 1.2.6. 

Concerning the different views presented in this text, nothing could be more beneficial than the 64 exercises 
of the last two chapters of [1, chap. 11, 12]. As an example a delicate presentation of measures of information is 
given in page 1, due to exercise 11.2 [1, p. 501], or subsection 1.2.4 on quantum entanglement was inspired by 
exercise 11.14 [1, p. 514]. In some occasions special properties where proved in order to solve the exercises. Such 
a property is the preservation of quantum entropy by unitary operations (property 2 of von Neumann entropy, 
in page 11), which was needed to solve for example exercise 11.19,20 [1, p. 517, 518]. For these last two exercises 
some special lemmas were proven in appendices A.l and A. 2. It is important to notice that from the above 
mentioned exercises of [1, chap. 11, 12] only half of them are presented here. 

Notation 

Finally about the mathematical notation involved, the symbol = should be interpreted as "defined by", and the 
symbol = as "to be identified with". 
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Chapter 1 

Physics and information theory 



Since its invention, computer science [6] was considered as a branch of mathematics, in contrast to information 
theory [7, 8] which was viewed by its physical reahzation; quoting Rolf Landauer [9] "information is physical". 
The last decades changed the landscape and both computers and information are mostly approached by their 
physical implementations [10, 11, 12, 13]. This view is not only more natural, but in the case of quantum 
laws it gives very exciting results and sometimes an intriguing view of what information is or can be. Such an 
understanding could never be inspired just from mathematics. Moreover there is a possibility that the inverse 
relation exists between physics and information [14, 15] or quoting Steane [3] one could find a new methods of 
studying physics by "the ways that nature allows, and prevents, information to be expressed and manipulated, 
rather than [the ways it allows/ particles to move". Such a program is still in its infancy, however one relevant 
application is presented in subsection 1.2.6. In this chapter the well established information theory based on 
classical physics is presented in section 1.1, and the corresponding up to date known results for quantum physics 
are going to be analyzed in section 1.2. 

1.1 Classical physics and information theory 

In classical physics all entities have certain properties which can be known up to a desired accuracy. This fact 
gives a simple pattern to store and transmit information, by assigning information content to each of the property 
a physical object can have. For example storage can be realized by writing on a paper, where information lays 
upon each letter, or on a magnetic disk, where information, in binary digits (bits), is represented each of the 
two spin states a magnetic dipole can have. In what concerns transmission, speech is one example, where each 
sound corresponds to an information, or a second example is an electronic signal on a wire, where each state 
of the electricity is related to some piece of information. Unfortunately in every day life such simple patterns 
are non-economical and unreliable. This is because communication is realized by physical entities which are 
imperfect, and hence they can be influenced by environmental noise, resulting information distortion. 

Concerning the economical transmission of information, one can see that the naive pattern of information 
assignment presented in the last paragraph is not always an optimal choice. This is because a message, in 
English language for example, contains symbols with different occurrence frequencies. Looking for example 
this text one can note immediately that the occurrence probability of letter a, pa, is much greater than that 
of exclamation p\ . According to the naive assignment, English language symbols are encoded to codewords of 
identical length I, and the average space needed to store a is Ip^ and of the exclamation Ip, , and since p^ > Pu 
a lot of space is wasted for the letter a. In order to present how a encoding scheme can be economical consider 
a four letter alphabet A, B, C, D, with occurrence probabilities Pa = j, Pb = ^, Pc = Pv = jq, the subsequent 
assignment of bits: A ^ 1, B ^ 01, C ^ 010, and D — » Oil. A message of n symbols, using this encoding, has 
on average ri {pi, + 2p^ + 3pc + ^Pd) = ^(f + 2|+3j^ + 3j^) = ?^(^) bits instead of 2n which would needed 
if somebody just mapped to each letter a two bit codeword. 

The topics discussed in the last two paragraphs give rise to the most important information theoretic 
question: which are the minimal resources needed to reliably communicate? An answer to this question can be 
given by abstractly quantifying information in relevance to the physical resources needed to carry it. Motivated 
by the previously demonstrated four letter alphabet example, probabilities are going to be used for such an 
abstraction. One now defines a function H, quantifying a piece of information I, exhibiting the following 
reasonable properties: 
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1. H {!) is a. function only of the probability of occurrence p of information I, thus H {!) = H (p) . 

2. iJ is a smooth function. 

3. The resources needed for two independent informations with individual probabilities p,q > are the sum 
of the resources needed for one alone, or in mathematical language H (pq) = H {p) + H {q) . 

The second and third property imply that iJ is a logarithmic function, and by setting q ~ I in the third it 
is immediate to see that H (1) = 0. Hence H {p) = klogp, where k and a are constants to be determined (refer 
to comments after equations (1.2) and (1-5)). This means that the average of resources needed when one of the 
mutually exclusive set of information with probabilities pi,. . ■ ,Pn occurs is 

H{pi,... ,p„) = k^p^\og^Pi. (1.1) 

i 

It should be noted that probability is not the only way of quantifying information [14, 15, 16, 17, 18, 19]. 



1.1.1 Shannon entropy and relevant measures of classical information theory 

The function Q found in (1.1) is known in physics as entropy, and measures the order of a specific statistical 
system. Of course one interesting physical system is an n-bit computer memory, and if all the possible cases of 
data entry arc; described by a random variable X with probabilities pi , . . . , P2" , then the computer's memory 
should have an entropy given by 

ff(X) = if(pi,... ,P2n) = -^Pxl0gPx. (1.2) 

X 

Here a modified version of (1.1) is used, with log = logj and = 1, an assignment to be verified after equation 
(1.5). Equation (1.2) is known in information theory as the Shannon's entropy [7]. There are two complementary 
ways of understanding Shannon's entropy. It can be considered as a measure of uncertainty before learning the 
value of a physical information or the information gain after learning it. 

The Shannon entropy gives rise to other important measures of information. One such is the relative entropy, 
which is defined by 

H{px\\qx) = ^Px^og— = -H{X) - ^Px^ogq^, (1.3) 

and is a measure of distance between two probability distributions. This is because it can be proven that 

H{Px\\qx) > 0, (1.4) 
H {PxWix) = 0^yx:px = qx. 

Of course it is not a metric because as one can check H {px\\qx) = H {qx\\px) is not always true. The relative 

entropy is often useful, not in itself, but because it helps finding important results. One such is derived using 
the last equality in (1.3) and (1.4), then in a memory with n-bits 

H{X) < log2" = n, (1.5) 
H{X) = log2" = n^Vi:pi = ^, 

which justifies the selection of fc = 1 in (1.2), because in the optimal case and in absence of noise, the maximum 
physical resources needed to transmit or to store an n-bit word should not exceed n. One can also see from 
(1.2) that 

H{X) > 0, (1.6) 
H {X) = ^ system X is in a definite state {p= 1). 

Other important results arc deduced relative entropy, and concern useful entropic quantities such as the 
joint entropy, the entropy of X conditional on knowing Y, and the common or mutual information of X and 
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Y. Those entropies are correspondingly defined by the subsequent intuitive relations 

H{X,Y) 4 -J2p^v'^ogp,y, (1.7) 

xy 

H{X\Y) = H{X,Y)-H{Y), (1.8) 
H{X:Y) = H{X) + H{Y)-H{X,Y)=H{X)-H{X\Y), (1.9) 

and can be represented in the 'entropy Venn diagram' as shown in Figure 1.1. 

H(X) H(Y) 



Figure 1.1: Relationships between different entropies. 

1.1.2 Basic properties of Shannon entropy 

It is worth mentioning here the basic properties of Shannon entropy: 

1. H {X, Y) ^ H {Y,X) , H {X -.Y) ^ H [Y : X) . 

2. H {Y\X) > and thus by the second equahty of (1.9) H {X : Y) < H (Y) , with equality if and only if Y 
is a function of X, Y = f (X) . 

3. H (X) < H (X, Y) , with equahty if and only if F is a fmiction of X. 

4. Subadditivity: H {X, Y) < H {X) + H {Y) with equality if and only if X and Y are independent 
random variables. 

5. H {Y\X) < H (Y) and thus by the second equality of (1.9) H {X : Y) > 0, with equality in each if and 
only if X and Y are independent variables. 

6. Strong subadditivity: H {X, Y, Z) + H {X) < H {X,Y) + H {Y, Z) , with equality if and only if Z ^ 

Y ^ X forms a Markov chain. 

7. Conditioning reduces entropy: H {X\Y, Z) < H {X\Y) . 

8. Chaining rule for conditional entropies: Let Xi, . . . , X„ and Y be any set of random variables, then 

H{Xi,... ,Xr^Y) = J:tlH{X^\Y,Xl,... ,X,^i). 

9. Concavity of the entropy: Suppose there are probabilities Qi > 0, Pi > and then H C^^pigj) > 
^iPiH (qj) , with equality if and only if qjs are identical. 

The various relationships between entropies may mostly be derived from the 'entropy Venn diagram' shown 
in Figure 1.1. Such figures are not completely reliable as a guide to properties of entropy, but they provide a 
useful mnemonic for remembering the various definitions and properties of entropy. The proofs of the above 
mentioned properties follow. 

Proof 
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1. Obvious from the definition of joint entropy (1.7) and mutual entropy (1.9). 

2. Based on the definition of conditional entropy (1.8) 

H{Y\X) 4 iJ(X,y)-ff(y) = -^p(x,2/)logp(x,2/) + ^p(2/)logp(y) 

xy X 

= -'^p{x,y)logp{x,y) + '^p{x,y)logp{y) = - (a;, y) log ^^^^^ 

xy xy xy ^ ^ 

= -H (p{x,y)\\p{y)) , 
and using (1.4), where equality holds if and only iip[X, Y) = p (F) , that is F = / (X) . 

3. It is immediately proven by the second one and using the definition of conditional entropy (1.8). 

4. Following the subsequent steps of calculation 

H {p{x,y)\\p{x)p{y)) = -H {p{x,y)) -^p{x,y)logp{x)p{y) 

xy 

= -H{p{x,y)) - ^p{x,y)\ogp{x) - ^p{x,y)\ogp{y) 

xy xy 

= -H{p{x,y))+H{p{x)) + H{p{y)), 
where the second equality of (1.3) was used. The result follows directly from equation (1.4). 

5. It is easily derived from subadditivity (property 4) and the definition of conditional entropy (1.8). 

6. First note that by the definition of the joint entropy (1.7) and some algebra H {X,Y, Z) + H (Y) — 
H{X,Y)-H{Y,Z) = j:^^y^^pix,y,z) log g|fgf^ • Then using the fact that log x < ^ for all positive 
X and equality achieved if and only if a; = 1, the following can be concluded 

e " tf. \pix,y,z)piy) ') 



xyz 



J_ ( p{y)p{y) \ _ J_ ^ 

ln2 p{y) i ln2 



with equality if and only if p^^'y^^z^ply) = 1 P {^Iv) — P iz\x, y) <^ X ^ Y ^ Z is a Markov chain, q.e.d. 

7. From strong subadditivity (property 6) it straightforward that H {X, Y,Z)-H (Y, Z) < H {X, Y)-H {Y) 
and from the definition of conditional entropy (1.26c) the result is obvious. 

8. First the result is proven for n = 2 using the definition of conditional entropy (1.26c) 

H{X^,X2\Y) = H{Xi,X2,Y)-H{Y) = H{Xi,X2,Y)-H{Xi,Y) + H{Xi,Y)-H{Y) 
= H{X2\Y,Xi) + H{Xi\Y). 

Now induction is going to be used to prove it for every n. Assume that the result holds for n, then 
using the one forn = 2, iJ (Xi, . . . , X„+i [F) = iJ (X2, . . . , X„+i |r, Xi) + H {Xi\Y) , and applying the 
inductive hypothesis to the first term on the right hand side gives 

n+l n+1 

H (Xi, . . . , Xn+i\Y) = Y.H{Xi,..., Xn+i\Y, Xi_i) + H {Xi\Y) = ^ F (X^, . . . , X„+i|y, Xi_i) , 



q.e.d. 
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9. The concavity of Shannon entropy, wiU be deduced by the concavity of von Neumann's entropy in 1.38. 
However here is going to be demonstrated that binary entropy (i?bin (p) — H (p,l — p)) is strictly concave, 

Hbin {pXi + (1 - p) X2) > pHbin (Xl) + (1 - p) Hbin {X2) , 

where < p, xi, X2 < 1 and equality holds for the trivial cases xi = .X2, or p = 0, or p = 1. This is easily 
proved by using the fact that the logarithmic function is increasing and —p (1 — x) > — (1 — px) , hence 

Hhin {pXl + {1 - p) X2) = - (pXi + (1 -p)x2)log(p.Ti + (1 -p)x2) 

- [1 - (pxi + (1 - p) X2)] log [1 - (pxi + (1 - p) X2)] 

> -pXilogXi - (1 -p)x2l0gX2 -p(l - Xi) l0g(l -pXi) 

-(1-p) (l-X2)l0g[l-(l-p)x2] 
= pi?bin (Xl) + (1 -p)Fbin (X2) . 

The strictness of concave property is seen by noting that only in the trivial cases inequalities such as 
logpxi < log (pxi + (1 — p) X2) could be equalities. Finally concerning the binary entropy it is obvious that 
because ^i?bin (p) = - logp-l+log (1 - p)+l = <^ p = i, and forp ^ 0, 1, ^i?bin (p) = -^-jz^ < 0> 
the maximum is reached at p = ^ . 

Some additional notes on the properties of Shannon entropy 

Concluding the properties of Shannon entropy, it should be noted that the mutual information is not always 
subadditive or superadditive. One counterexample for the first is the case where X and Y are independent 
identically distributed random variables taking the values or 1 with half probability. Let Z = X ®Y, where 
e the modulo 2 addition, then H{X,Y : Z) = l and further calculating h\x : Z) + H {Y : Z) = 0, that is 

H {X,Y : Z) ^ H {X : Z) + H [Y : Z) . 

The counterexample concerning the second case is the case of a random variable Xi taking values or 1 
with half probabilities and X2 = Y^ = Y2 = Xi. Then H {Xi : Fi) + iJ {X2 ■.Y2) = 2 and in addition to this 
H {Xi, X2 : ^1,12) = 1) which means that 

H (Xi ■.Y,)+H {X2 ■.Y2)iH {XuX2 : ^1,^2) • 

1.2 Quantum physics and information theory 

Quantum theory is another very important area of physics, which is used to describe the elementary particles that 
make up our world. The laws and the intuition of quantum theory are totally different from the classical case. 
To be more specific quantum theory is considered as counter intuitive, or quoting Richard Feynman, "nobody 
really understands quantum mechanics". However quantum physics offers new phenomena and properties which 
can change peoples view for information. These properties are going to be investigated in this section. 

1.2.1 Basic features of quantum mechanics relevant to information theory 

Mathematically quantum entities are represented by Hilbert space vectors , usually normalized {tpltp) = 1. 
Quantum systems evolve unitarily, that is, if a system is initially in a state IV'i) , it becomes later another state 
|f/'2) after a unitary operation 

t/ 1^1) = 1^2)- (1-10) 

Unitary operations are reversible, since UU"^ = 1 and previous states can be reconstructed hy \ipi) = \ip2) ■ 
What is important about such operations is that because (V'21'02) = ii'il^^U \tpi) = {'4>i\I\ipi} = 1, normal- 
ization, which soon will be interpreted as probability, is conserved. The measurement of the properties of these 
objects is described by a collection of operators {Mm} ■ The quantum object will found to be in the m-th state 
with probability 

p(m) = (V|Mt,M„|V;), (1.11) 
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and after this measurement it is going to be in definite state, possibly different from the starting one 



I \^) ■ (1-12) 

Of course m should be viewed as the measurement outcome, hence information extracted from a physical system. 

This way information content can be assigned to each state of a quantum object. As a practical example, in 
each energy state of an atom one can map four numbers or in each polarization state of a photon one can 
map two numbers say and 1. The last case is similar to bits of classical information theory, but because 
a photon is a quantum entity they arc named qubits (quantum bits). It should be stressed here that the 
measurement operators should satisfy the completeness relation J2m^m^m = I which results X]^p(to) = 
(V"! M^^Mm IV') = 1 as instructed by probability theory. However this implies that \ M^^Mm |V') 7^ 1 and 
looking at equation (1.12) one understands that measurement is an irreversible operation. 

What is very interesting about quantum entities is that they can either be in a definite state or in a 
superposition of states! Mathematically this is written 

s 

Using the language of quantum theory 1^") is in state \s) with probability c*Cs, and because of normalization 
the total probability of measuring \ip) is 

(viv) = E<^« = i- (1-13) 

States \s) are usually orthonormal to each other, hence 

(V|s) = c,. (1.14) 

Although being simultaneously in many states sounds weird, quantum information can be very powerful in 
computing. Suppose; some quantum computer takes as input quantum objects, which are in a superposition 
of multiple states, then the output is going to be quantum objects which of course are going to be in multiple 
states too. This way one can have many calculations done only by one computational step! However careful 
extraction of results is needed [13, 20, 21, 22, 23], because quantum measurement has as outcome only one 
answer from the superposition of multiple states, as equation (1.12) instructs, and further information is lost. 
Then one can have incredible results, like for example calculate discrete logarithms and factorize numbers in 
polynomial time [20, 21], or search an unsorted database of N objects with only ^/N iterations [22, 23]! 

In contrast to classical information which under perfect conditions can be known up to a desired accuracy, 
quantum information is sometimes ambiguous. This is because one cannot distinguish non-orthogonal states 
reliably. Assuming for a while that such a distinction is possible for two non-orthogonal states \tp^) and \tp2) 
and a collection of measurement operators {Mm} ■ Then according to this assumption some of the measuring 
operators give reliable information whether the measured quantity is \tpi) or \tp2) , and collecting them together 
the following two distinguishing POVM elements can be defined 

Ei^ J2 MlMrn,i = l,2. 

Mm measuring i 

The assumption that these states can be reliably distinguished, is expressed mathematically 

{iP,\Ei\^,) = l, i = l,2. (1.15) 

Since 2^1 = / it follows that J2i=i 2 (V'll \tpi) = 1. Because Ei operator reliable measures the first 

state, then (^"11 Ei j^^i) = 1, hence the other term must be 

(V'll-BalV'i) =0. (1.16) 

Suppose 11^2) is decomposed in [f/'i) and and orthogonal state to \ipi) , say |V'_l) ; then [^'2) = Q IV'i) + P IV' _l) • 
Of course |a|^ + \pf = 1 and |/3| < 1 because |V'i) and IV'2) are not orthogonal. Using the last decomposition 
and (1.16) (V'al ^2 IV'2) = ll^f (V'.lI ^2 |V'_l) < If^f < 1 which is in contradiction with (1.15). 



6 



In what concerns the results of the last paragraph it should be additionally mentioned that information gain 
implies disturbance. Let \ip) and |(/>) be non-orthogonal states, without loss of generality assume that a unitary 
process is used to obtain information with the aid of an ancilla system \u). Assuming that such a process does 
not disturb the system, then in both cases, one obtains 

IV') O \u) IV") ® \v) , 

O \u) \v') . 

Then one would like \v) and \v') to be different, in order to acquire information about the states. However since 
the inner products are preserved under unitary transformations, {v\v') {ip\4>) = {u\u) {ip\(l)) =^ {v\v') — {u\u) = 1, 
and hence are identical. Thus distinguishing between jf/') and \(j)) must inevitably disturb at least one of these 
states. 

However at least theoretically there is always a way of distinguishing orthonormal states. Suppose \i) 
are orthonormal, then it is straightforward to define the set of operators Mj = \i) {i\ plus the operator 
Mo = — J2i^o K) (^1' which satisfy the completeness relation. Now if the state \i) is prepared then p{i) = 

{i\ M^Mi \i) = 1, thus they are reliably distinguished. 

Another very surprising result is the prohibition of copying arbitrary quantum states. This is known as 
no-cloning theorem [24, 25] and it can be very easily proven. Suppose it is possible to have a quantum photo- 
copying machine, which will have as input a quantum white paper \w) and a state to be copied. The quantum 
photocopying machine should be realized by a unitary transformation U and if somebody tries to photocopy 
two states \Tp) and , it should very naturally work as follows 

u{\w)(g>m = \^)(E>w, 

u{\w)®\4>}} = \4>)®\4>). 

Now taking the inner product of these relations (V-'l^) = (V-'l'/')^ ; thus {iJj\4') = or {ip\(f>) = 1, hence = 
or \tp) and |(/>) are orthogonal. This means that cloning is allowed only for orthogonal states! Thus at least 
somebody can construct a device, by quantum circuits, to copy orthogonal states. For example if \tp) and |^) 
are orthogonal then there exists a unitary transformation U such that U \ip) ~ |0) and C/|</)) = |1) . Then by 
applying the FANOUT quantum gate, which maps the input to |00) if the input qubit was |0) and to |11) if the 
it was |1) , and further applying f/^ (g) V'^ to the tensor product of outcoming qubits, then either the state 
is will be copied and finally get jV'?/') : or |(/)) and get \(t>(t>) ■ 

The basic information relevant features of quantum mechanics, analyzed in this subsection, are summarized 
in Figure 1.2. 



Basic features of quantum mechanics 

1. Reversibility of quantum operations 

2. Irreversibility of measurement 

3. Probabilistic outcomes 

4. Superposition of states 

5. Distinguishability of orthogonal states by operators 

6. Non-distinguishability of non-orthogonal states 

7. Information gain implies disturbance 

8. Non-cloning of non-orthogonal states 



Figure 1.2: A summary of basic information relative features, of quantum mechanics. 



1.2.2 The language of quantum information theory: quantum thermodynamics 

As in the case of classical information theory thermodynamics should be studied for abstracting quantum 
information. As it was already stated in the last paragraphs, quantum mechanics have some sort of built-in 
probabilities. However this is not enough. The probabilistic features of quantum states are different from that 
of classical thermodynamics. The difference is demonstrated by taking two totally independent facts A and B. 
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Then the probabihty of both of them occurring would be pci^^sicai _ p _^ p ^^-j _ contrast, in quantum 
mechanics the wave functions should be added, and by calculating the inner product probabilities are obtained. 
More analytically 

pquantum ^ (( | + ( I ) (I V-^) + I V-b) ) = ((V-^ I V-a) + (V'b I V-s) ) + 2Re ( | Vs) (1-17) 

in general 

Moreover if somebody decides to encode some information with quantum objects then he is going to be interested 
with probabilities of occurrence of the alphabet, exactly the same way it was done in section 1.1, and he would 
never like to mess up with the probabilities already found in quantum entities. For this reason a quantum version 
of thermodynamics is needed. In the sequel the name thermodynamical probabilities is used to distinguish the 
statistical mixture of several quantum states, from the quantum probabilities occurring by observation of a 
quantum state. 

The basic tool of quantum thermodynamics is the density matrix and its simplest case is when the quantum 
state occurs with thermodynamical probability p = 1. Then by (1.14) {t\tp) {il>\s) = CfCg and therefore 

P = IV') 

is the natural matrix generalization of a quantum vector state. This was the definition of a density matrix of a 
pure state, in contradiction to mixture of states where several states occur with probabilities < p < 1. Each 
element of this matrix is 

Pts - (*l^> (Ms) = C*tCs. 

and by equation (1.13) where the normalization of vector was defined, density matrix is correspondingly nor- 
malized by 

tvp = j2pss = J2''>^^T. (^i^) (^i^) = 1- (1-18) 

s s s 

Moreover it is straightforward that unitary evolution (1.10) is described by 

U\ip) {ip\U^ = UpU^, (1.19) 
the probability for measuring the m-th state (1.11) is given by 

p (m) = (VI MlMm IV) = m Ml J2 ^rn IV) = E ^™ 1^) ^™ 1^) = i^lp^rn) , (1.20) 

s s 

and the state after measurement (1.12) is obtained by 

IV) (VI , vM^pMl (1.21) 



Suppose now there is a collection of quantum states |Vj) occurring with thermodynamical probabilities Pi, 
then the mixed density operator is simply defined 



P = E^'^IV'i)(^.l- (1-22) 



This construction is precisely what was expected in the beginning of this subsection, because the thermody- 
namical probabilities are real numbers and can be added, contrasting to the situation of quantum probabilities 
for a state vector (1.17). The generalization of normalization (1.18), unitary evolution (1.19), probability of 
measurement (1.20) and measurement (1-21) for the mixed density matrix are 

trp = EPitr(|Vi)(Vil) = l, 

i 

E^^^IV'i) {^i\=UpU\ 



p{m)=tv (MlpMm), 
^M^pMl. 

[MinpMm) 



tr 
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The above results are pretty obvious for the first, the second and the fourth, but some algebra of probabilities 
is needed for the third (refer to [1, p. 99] for details). The unitary evolution of the system can be generalized by 
quantum operations 

i 

where J2i ^l^i — ^- Quantum operations are sometimes referred in the literature [2] as superoperators. 

Quantum thermodynamics, just described can be viewed as a transition from classical to quantum thermo- 
dynamics. Suppose an ensemble of n particles in an equilibrium is given. For this ensemble assume that the 
i-th particle is located at Xj, has velocity Vj and energy Ei (x,, Vj) . Classical thermodynamics says that for the 
i-th particle, there is a probability 



-I3E, 



(1.23) 



to be in this state, where /3 = with ks Boltzmann's constant and T the temperature. Now if the particles 

where described by quantum mechanics, then if the i-th would have an eigenenergy Ei, given by the solution 
of the Schrodinger equation H = Ei \%pi) , where H is the Hamiltonian of the system. Now with the help of 
equation (1.23) the density matrix as was defined in (1.22) can be written as 



1 



-l3Ei 



1 



tre-/3ff 



-(3H 



which is a generalization of classical thermodynamical probability (1.23). This transition is illustrated in Figure 
1.3. 



Statistical behavior of classical particles: 
having velocity v^, at x^, and energy E^ (vj,Xj) , 



with probability Pi 




X2 
I V2 


v„ 

Pn,' • 






Pi 


N VI 
• 

xi 



Statistical behavior of quantum particles: 
at state \ij)i) , given by H = Ei |^^) 
with probability Pi = ^^^-du 




Figure 1.3: Transition from classical to quantum thermodynamics. 



1.2.3 Von Neumann entropy and relevant measures of quantum information the- 
ory 

Willing to describe quantum information, one could use quantum version of entropy, and in order to justify its 

mathematical definition, recall how Shannon entropy was given in equation (1.2), and assume that a density 
matrix p is diagonalized hy p = A^; |a;) (a;| . Then naturally quantum entropy of p is defined 

^(p) = -E^xlogA,. (1.24) 

X 

Translating this into the mathematical formalism developed in the last subsection, the von Neumann entropy 
is defined 

S{p)^-%^{p\ogp). (1.25) 



9 



The last formula is often used for proving theoretical results and equation (1.24) is used for calculations. As an 
example the von Neumann entropy of p = p |0) (0| + (1 — p) is found to be 



S{p) = S 



i_p i_p 

^ 2 2 

1—p 1— p 

2 2 



= -AilogAi - A2logA2, 



where Ai = "'""'"^^^^^^ and A2 = ^ V(^+^p ^p) .^-^^ ^j^^ eigenvalues of the corresponding matrix. Sur- 
prisingly S (p) ^ H {p, 1—p) even if the same probabilities where assigned for both of them! This shows 
that quantum probabilities are not expelled by quantum thermodynamics. The equality could only hold if the 
probabilities written in Shannon's entropy are the eigenvalues of the density matrix. 

Following the same path as for classical information theory, in the quantum case it is straightforward to 
define the joint entropy, the relative entropy of p to a, the entropy of A conditional on knowing B and the 
common or mutual information of A and B. Each case is correspondingly 

S{A,B) ^ tr(p^^log/^), (1.26a) 

S{p\\<T) ^ tr(plogp)-tr(ploga), (1.26b) 

S(A\B) ^ S{A,B)-S(B), (1.26c) 

S{A:B) ^ S{A) + S{B)-S{A,B) = S{A)-S{A\B). (1.26d) 

One can see that there are lot of similarities between Shannon's and von Neumann's entropy. As such one can 
prove a result reminding equation (1.4) 

S{p\\a) > 0, (1.27a) 

S{p\\a) = 0<^p = £7, (1.27b) 

and is known as Klein's inequality. This inequality provides evidence of why von Neumann relative entropy is 
close to the notion of metric. What is also important is that it can be used, like in the classic case, to prove 
something corresponding to equation (1.5) of Shannon's entropy 

S(p) < logd (1.28a) 

S{p) = \ogd4=>p=^I. (1.28b) 

In addition to this from the definition (1.25) it follows that 

S{p) > 0, (1.29a) 
S{p) = <i=^> p is pure, (1.29b) 

which resembles to equation (1.6). 

One can also prove that supposing some Pj states, with probabilities pi, have support on orthogonal sub- 
spaces, then 



S ^Pip}j = H ipi) +Y,PiS (Pi) . 



Directly from this relation the joint entropy theorem can be proven, where supposing that pi are probabilities, 
\i) are orthogonal states for a system A, and Pj is an set of density matrices of another system B, then 

S (^p, \i) {i\ ®p^=H ipi) + J2PiS iPi) . (1.30) 

Using the definition of von Neumann entropy (1.25) and the above mentioned theorem for the case where pi = a 
for every i, and let p be a density matrix with eigenvalues pi, and eigenvectors \i) , then the entropy of a tensor 
product p C7 is found to be 

S{p(^a) = S{p) + S{c7). (1.31) 

Another interesting result can be derived by Schmidt decomposition; if a composite system AB is in a pure 
state, it has subsystems A and B with density matrices of equal eigenvalues, and by (1.24) 

S{A) = S{B). (1.32) 
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1.2.4 A great discrepancy between classical and quantum information theory: 
quantum entanglement 

The tools developed in the above subsection can help reveal a great discrepancy between classical and quantum 
information theory: entanglement. Concerning the nomenclature, in quantum mechanics two states are named 
entangled if they cannot be written as a tensor product of other states. For the demonstration of the aforemen- 
tioned discrepancy, let for example a composite system AB be in an entangled pure state \AB) , then, because 
of entanglement, in the Schmidt decomposition it should be written as the sum of more than one terms 

\AB} = \iA} lis) , with |/| > 1, (1.33) 
iei 

where \iA) and lis) are orthonormal bases. The corresponding density matrix is obviously p^^ = \ AB) {AB\ = 
Si je/ ^i^^j N^) 1*^) 0^1 Ob| • usually the density matrix of the subsystem B can be found by tracing out 
system A, 

= tr^ (p^^) = J2 ^i^j ^^A\iA) lis) {jA\kA) O'bI = lis) {is] ■ 

i,j,k^I i^I 

Now because of the assumption |/| > 1 in (1.33) and the fact that jis) arc orthonormal bases and it is impossible 
to collect them together in a tensor product, subsystem B is not pure. Thus by equation (1.29a) S (B) > 0, 
AB is pure thus by (1.29b) S {A, B)=0 and obviously by (1.26b) S {A\B) < 0. The last steps can be repeated 
backwards and the conclusion which can be drawn is that a pure composite system AB is entangled if and only 
ifS{A\B) < 0. 

Of course in classical information theory conditional entropy could only he H {X\Y) > (property 2 in 
subsection 1.1.2) and that is obviously the reason why entangled states did not exist at all! This is an exclusive 
feature of quantum information theory. A very intriguing or better to say a very entangled feature, which until 
now is not well understood by physicists. However it has incredible applications, such as quantum cryptography, 
which will be the main topic of the chapter 4. Concerning the nomenclature, entangled states are named after 

the fact that S {A\B) < '^^^^ s {A, B) < S {B) which means that the ignorance about a system B can be in 
quantum mechanics more than the ignorance of both A and B\ This proposes some correlation between these 
two systems. 

How can nature have such a peculiar property? Imagine a simple pair of quantum particles, with two possible 
states each |0) and |1) . Then a possible formulation of entanglement can be a state 1^-) = « + After 
a measurement Mm of the first particle for example, according to (1.12) 

M,n IV-') = 1 ® \m)) + (|1 - m) (g) |1 - m)) = |m) (g) |m) , 

hence they both collapse to state |m) . This example sheds light into the quantum property, where ignorance of 
both particles is greater than the ignorance of one of them, since perfect knowledge about one implies perfect 
knowledge about the second. 

1.2.5 Basic properties of von Neumann entropy 

The basic properties of von Neumann entropy, which can be compared to the properties of Shannon entropy 
discussed in subsection 1.1.2, are: 

1. S {A, B) = S {B, A) , S {A : B) = S {B : A) . 

2. Unitary operations preserve entropy: S(UpW) = S {p) . 

3. Subadditivity: S (A, B) < S (A) + S (B) . 

4. S{A,B)>\S (A) - S{B)\ {Triangle or Araki-Lieb inequality). 

5. Strict concavity of the entropy: Suppose there are probabilities > and the corresponding density 

matrices p^, then S {J2iPiPi) < J2iPi'^ (Pi) ^^'^ {J2iPiPi) = J2iPi'^ (Pi) Pi which Pi > are all 
identical. 
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6. Upper bound of a mixture of states: Suppose p = '^iPiPi where pi probabilities and tlie correspond- 
ing density matrices p^, then S (p) < YliiPi^ {Pi) + H {pi) and S (p) = YliiPi^ {Pi) + H {pi) <^ have 
support on orthogonal subspaces. 

7. Strong subadditivity: S [A, B,C) + S {B) < SiA,B) + S {B, C) , or equivalently S {A) + S {B) < 
S{A, C) + S{B,C). 

8. Conditioning reduces entropy: S{A\B,C) < S{A\B) . 

9. Discarding quantum systems never increases mutual information: Suppose ABC is a composite 
quantum system, then S {A : B) < S {A : B ,C) . 

10. Trace preserving quantum operations never increase mutual information: Suppose AB is a 
composite quantum system and f is a trace preserving quantum operation on system B. Let S {A: B) 
denote the mutual information between systems A and B before £ applied to system B, and S {A' : B') 
the mutual information after £ is applied to system B. Then S {A' : B') < S {A: B) . 

11. Relative entropy is jointly convex in its arguments: let < A < 1, then 

S (AAi + (1 - A) A2IIAB1 + (1 - A) B2) > XS (^i| + (1 - A) 5 (A2IIB2) • 



12. The relative entropy is monotonic: S [p'^\\(j^) < S {p^^Wa-"^^) . 
Proof 

1. Obvious from the definition of joint entropy (1.26a) and mutual entropy (1.26d). 

2. Let i7 be a unitary matrix then S (UpW) = -tr [UpU'' log (UpW)] = -tr [UpWU (log p) W] , where the 
fact that UpU^ and p are similar and hence they have the same eigenvalues, was employed. Furthermore 
U is unitary, hence UW = I, and the proof is concluded S (UpW) = —tr \Up {log p) ''■(^-^)_*''(^^) 
-tr [U^Up {log p)] ""'Er' -tr{plogp) ^ S {p) . 

3. Refer to [1, p.516]. 

4. Assume a fictitious state R purifying the system p'^^, then by applying subadditivity (property 3) 
S{R) + S {A) > S {A, R) . Because p^^^ is pure, and by (1.32) S (i?) = S {A, B) and 5* {A, R) = S {B) . 
Combining the last two equations with the last inequality, S {A, B) > S {B) — S {A) . Moreover be- 
cause S {A, B) S {B, A) , A and B can be interchanged and then 5" {A, B) > S {A) - S {B) , thus 
S {A, B) > \S {A) -S{B)\.lt is obvious by (1.31) that the equahty holds if ^ p"^ ® p^ . This is hard 
to understand because R system was artificially introduced. Another way to obtain the equality condi- 
tion is by assuming p^^ has a spectral decomposition p"^^ = Ajfe (i^| ® Ifcs) (A;b| , and obviously 

=txBp^^ = (Efc ^tk) \iA) {iA\ and p^ = Y,k iJ2i ^ik) I^b) (fcsl , then one can write 

S {A, B) = S{B)-S {A) ^ Yl log Xik = J2 log ^ \ik = ■ 

Summing over k in the last equation 



^A^fc- ^''^^ = ^ i^^^ik] = 1 <S4> ^A,fe = 1, 



where the last relation holds because Ajj, > 0. Now combining the last two outcomes 



S {A, B) = S{B)-S {A) 4^ A.fc = ^ Xjk ^ X^k = A.. 

3 



Rephrasing this result the equality condition holds if and only if the matrices tr^ {\i) have a common 
eigenbasis and the matrices iiA (|i) (*|) have orthogonal support. As an example of the above comment 
consider the systems p^ = |0) (0| and = \ |0) (0| -|- \ |1) (1| , then the entropy of each is S {A) = and 
S{B) = l and finally the joint system p^^ =p^®pB = ^ |00) (00| + 1 101) (01| has entropy S {A, B) = 1. 
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5. Introducing an auxiliary system _B, whose state has an orthogonal basis \i) corresponding to the states 
of system A. The joint system will have a state p^^ = J2iPiPi K) (*l > ^^'^ it;s entropy according to the 
joint entropy theorem (1.30) is S (A, B) = H (pi) + '^iPiPi- Moreover the entropy of each subsystem will 
be S {A) = S Q2^pip^) and S {B) = S Q2iPi N) = H {pi) , and by applying subadditivity (property 
3) S {A, B) < S {A) + S (B) , q.e.d. Concerning its equality conditions assume p = p^ for every i, then 
by calculating p"^^ = J2iPiPi ® I*) (^1 = P ® ^iPi K) (*l = <8) p^, and equality follows from (1.31). 
Conversely if S i^^piPj) = ^iPiS [pi) , and suppose there is at least one density matrix which is not 
equal to the others, say a = p,, then 



S{qp + pa) = qS{p)+pS{a). 



(1.34) 



where the following quantities where defined q = J2i^jPi' P — Pj P — Pi ^ov i ^ j. If the density 
matrices p, a have a spectral decomposition, say p = i2i \i)p i^lp ) = J2j ^^j \ OU • *° 
out that 



qS {p) + pS (ct) = ^ qXm log {qXm) + ^ Pi^m log (pKm) • 



(1.35) 



This was the right hand side of (1.34). To get the left hand side assume that the matrix qp + pa has a 
spectral decomposition qp + pa = J2m Pm l"^)pCT ' ^^'^'^ '^'^i^ ^''^^ unitary matrices connecting these bases, 
\i)p = Em^im l™)pa and \j) ^ = J^m'^im \m)p^. This implies that 



S {qp + pa) 



Q^XiUim\m}p^ u*i{l\^ 



jml 



'J ^jvn 



\'m)p^w*i 



X log q^XiUim \m)p^ u*i 



+ pY^Kj 

jml 



In), 



(1.36) 



= ~ X] U X] +P^ KjWjmW*i log U ^ XiUimU*i + P ^ KjWjmWji 



n \ iml 



jml 



iml 



jml 



6ii) was employed. Taking 



= -^{qXn+pKn) log {qXn + pKn) ■ 

n 

In the last step the fact that Uij and Wij are unitary [UW = / UijUij 
the left and right hand side of equation (1.34) as found in (1.35) and (1.36) correspondingly, it is simple 
to veriiy that 

^ qXm log (1 + pKm) + ^PKto log (1 + qXm) = 0. 



The fact that log (1 + pum) , log (1 + qXm) > 0, implies that both summations arc greater or equal to zero, 
and the last equality leaves no other case than being both of them zero. This can only happen if for the 
non-zero Xm^ Pm are null. An alternative proof that von Neumann entropy is concave, can be presented by 
defining f (p) ^ S {pp + {I — p) a) . Then by calculus if /" (p) < 0, then / is concave, that is for < p < 1, 
/ {px + (1 -p)y)> pf (x) + {l-p)f {y) . Selecting x = 1, y = 0, f {p) > pf (1) + {1 - p) f (0) , which 
according to the definition of /, implies that S {pp + {1 — p)a) > pS {p) + {1— p)S {a) . 

6. Refer to [1, p.518,519]. 

7. For the proof refer to [1, p. 519-521]. The fact that these inequalities are equivalent, will be presented 
here. If S {R) + S {B) < S (i?, C) + S {B,C) holds then by introducing an auxiliary system A, which 
purifies the system RBC, S (i?) = S {A, B, C) and S {R, C) = S {A, B) , so the last inequality becomes 
S {A, B, C) + S {B) < S {A, B) + S{B, C) . Conversely if 5* {R, B, C) + S{B)<S (i?, B) + S{B, C) holds by 
inserting again a system A purifying system RBC, because S {R, B,C) = S {A) and S {R, B) = S {A, C) , 
the last inequality becomes S {A) + S {B) < S {A, C) + S {B, C) . From this another equivalent form to 
write strong subadditivity isQ < S {C\A) + S {C\B) oy S {A) + S {B)- S {A, B) + S {A) + S {C)- S {A,C) < 
2S {A) ^ S {A : B) + S {A : C) <2S {A) . This inequality corresponds to the second property of Shannon 
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entropy, where H [X '■ Y) < H (X) , which is not always true for quantum information theory. For 
example, the composite system \ AB) = ^ |00) + 5 |11) , has entropy S {A, B) = because it is pure, and 
= 1 10) (0| + i |1) (1| , thus S{A) = 1 and similarly S {B) = 1. Hence S {A : B) > S {A) . 

8. Refer to [1, p.523]. 

9. Refer to [1, p.523]. 

10. Refer to [1, p.523]. 

11. For the proof refer to [1, p. 520]. Concerning this property it should be emphasized that joint concavity 
implies concavity in each input; this is obvious by selecting Bi = B2 = B or Ai = A2 = A. The converse 
is not true. For example by choosing / {x, y) = —x'^e^, which is convex on x because gfr/ = — 2e^ < for 
every x, y, and convex on y because -^f = —x^e^ < for every x, y. However / (i4 +§5,5 (—3) + ||) ~ 
-0.57 which is less than i/ (4, i) + §/ (-3, |) ~ -0.41. 

12. Refer to [1, p.524,525]. 

Using strict concavity to prove other properties of von Neumann entropy 

Strict concavity (property 5) of von Neumann entropy, can be used to prove (1.28b) and moreover that the 
completely mixed state ^/ on a space of d dimensions is the unique state of maximal entropy. To do this the 
following result is stated: for a x normal matrix A there exists a set of unitary matrices J7, such that 



J2U^^^AU^^^^=tr{A)I. (1.37) 
1=1 



For a proof refer to appendix A.l (the proof of a more general proposition is due to Abbas Edalat). To prove 
the uniqueness of as a maximal state, take A = p any quantum state, then S (p) = ^ Yl'i=i ^ (p) ^ 

Eti iS {U^'^PU^'^^) ""T"' ^(Eti lUl'^pUl'^^) ' S{^I) . Hence by strict concavity (property 5) 

any state p has less or equal entropy to the completely mixed state of and in order to be equal they should 
be identical. 

Using von Neumann entropy a proof of the concavity of Shannon entropy can be provided. Let Pi and Qi 
two probability distributions. Then 

— T \ / ^ — T \ property 5 ^ — . ^ — . 

T.P^lA = S ( ^Kg, |j) 01 j > J2PiS [qj li) (il) = Y.P^H {q,) , (1.38) 

with equality if and only if qj \j) are the same, that is qjS are identical. 

1.2.6 Measurement in quantum physics and information theory 

As already noted in the introduction of the present section, quantum physics seems very puzzling. This is 
because there are two types of evolution a quantum system can undergo: unitary and measurement. One 

imdcrstands that the first one is needed to preserve probability during evolution (see subsection 1.2.1). Then 
why a second type of evolution is needed? Information theory can explain this, using the following results: 

1. Projective measurement can increase entropy. This is derived using strict concavity. Let P be a projector 
and Q = I — P the complementary projector, then there exist unitary matrices Ui, U2 and a probability 
p such that for all p, PpP + QpQ = pUipUl + {1 — p) U2PU2 (refer to A. 2 for a proof), thus 



S {PpP + QpQ) = S (pUipUl + {l-p) U2pUl) pS (UipUl^ +{1-P)S (iJapf/l) (1.39) 

'-'^'''pS{p) + {l-p)S{p)=S{p). 
Because of strict concavity the equality holds if and only if PpP + QpQ = p. 



14 



2. General measurement can decrease entropy. One can convince himself by considering a qubit in state 
p = ^ |0) (0| + ^ |1) (1| , which is not pure thus S (p) > 0, which is measured using the measurement 
matrices Mi = |0) (0| and M2 = |0) (1| . If the result of the measurement is unknown then the state of the 
system afterwards is p' = MipM\ + M2pMl = |0) (0| , which is pure, hence S {p') = < S{p). 

3. Unitary evolution preserves entropy. This is already seen as von Neumann's entropy property 2. 

Now one should remember the information theoretic interpretation of entropy given throughout this chapter: 
entropy is the amount of knowledge one has about a system. Result 3 instructs that if only unitary evolutions 
were present in quantum theory, then no knowledge on any physical system could exist! One is relieved by 
seeing that knowledge can decrease or increase by measurements, as seen by results 1 and 2, and of course that 
is what measurements were meant to be in the first place. 
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Chapter 2 

Some important results of information 
theory 



Some very important results derived by information theory, which will be useful for the development of the 

next two chapters, concern the amount of accessible information and how data can be processed. These are 
presented correspondingly in section 2.1 and in section 2.2, both for the classical and the quantum case. 

2.1 Accessible information 

Information is not always perfectly known, for example during a transmission over a noisy channel there is a 
possibility of information loss. This means that obtaining upper bounds of accessible information can be very 
useful in practice. These upper bounds are calculated in subsection 2.1.1 for the classical and in subsection 
2.1.2 for the quantum case. 

2.1.1 Accessible classical information: Fano's inequality 

Of major importance, in classical information theory, is the amount of information that can be extracted from 

a random variable X based on the knowledge of another random variable Y. That should be given as an upper 
bound for H {X\Y) , and is going to be calculated next. Suppose X = f (Y) is some function which is used 



as the best guess for X. Let Pe — P yX ^ Xj be the probability that this guess is incorrect. Then an 'error' 
random variable can be defined 



thus H{E) = H (pe) . Using the chaining rule for conditional entropies H {E,X\Y) = H {E\X, Y) + H {X\Y) 
(Shannon entropy, property 8), however E is completely determined once X and Y are known, so H {E\X, Y) = Q 
and hence 



Applying chaining rule for conditional entropies (Shannon entropy, property 8) again, but for different variables 



and further because conditioning reduces entropy (Shannon entropy, property 7), H {E\Y) < H (E) = H (pe) , 
whence by (2.1) and (2.2) 




H{E,X\Y) = H{X\Y) . 



(2.1) 



H {E,X\Y) = H {X\E, Y) + H {E\Y) , 



(2.2) 




(2.3) 



(2.4) 
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2.1.2 Accessible quantum information: quantum Fano inequcility and the Holevo 
bound 

Quantum Fano inequality 

There exists analogous relation to (2.4), in quantum information theory, named quantum Fano inequality: 

S (p, £)<H{F {p, £)) + {1-F {p, £)) log (d^ - l) . (2.5) 

Where F{p,£) is the entanglement fidelity of a quantum operation defined in (B.l), for more details refer to 
appendix B.2. In the above equation, the entropy exchange of the operation £ upon p, was introduced 

S{p,£)^SiR',Q'), 

which is a measure of the noise caused by £, on a quantum system Q [p = p^), purified by R. The prime 
notation is used to indicate the states after the application of £. Note that the entropy exchange, does not 
depend upon the way in which the initial state oiQ, is purified by R. This is because any two purifications of Q 
into RQ are related by a unitary operation on the system i?, [1, p. Ill], and because of von Neumann entropy, 
property 2. 

Quantum Fano inequality (2.5) is proven by taking an orthonormal basis \i) for the system RQ, cho- 
sen so that the first state in the set |1) = \RQ) ■ Forming the quantities p; = {i\p^^ \i) , then it follows 
from (1.39) that S{R',Q') < H {pi, . . . ,Pcp) , and with some simple algebra H{pi,... ,Pcp) = H {pi) + 
(1 - Pi) H . . . , < log - 1) , and since p^^F (p, £) , q.e.d. 

The Holevo bound 

Another result giving an upper bound of accessible quantum information is the Holevo hound [26] 

F(X:F)<S(p)^^p,5(pJ, (2.6) 

X 

where p = 'YlxP'^Px- Moreover the right hand side of this inequality is useful in quantum information theory, 
and hence it is given a special name: Holevo x quantity. Concerning its proof assume that someone, named 
P, prepares some quantum information system Q with states px, where X = 0,... ,n, having probabilities 
Po, ... ,pn. The quantum information Q is going to be measured by another person, M, using POVM elements 
{Ey} = {Eq, . . . , Em} on the state and will have an outcome Y. The state of the total system before measurement 
will then be 

pPQM^j-p.^x) {x\^p,^\0) (0|, 

X 

where the tensor product was in respect to the order PQM. Matrix |0) (0| represents the initial state of the 
measurement system, which holds before getting any information. The measurement is described by an operator 
£, that acts on each state cr of Q by measuring it with POVM elements {Ey} , and storing on M the outcome. 
This can be expressed by 

£ia^ |0) (0|) = \y) {y\ . 

y 

Quantum operation £ is trace preserving. To see this first notice that it is made us of operations elements 
{^/E^ (S) Uy} , where Uy \y') = \y' + y) , with + the modulo n + 1 addition. Of course Uy is unitary since it is 
a map taking \y') basis vector to another basis vector \y' +y) , and hence it is change of basis from one basis 

to a cyclic permutation of the same basis. Now because Ey are POVM I = Ey = ^JE^ ^fEy 

Y.yy/E^^ y/E'y(^Upy = I,(i.e.d. 

Subsequently primes are used to denote states after the application of £, and unprimed notation for states 
before its application. Note now that S {P : Q) = S {P : Q, M) since M is initially uncorrelated with P and Q, 
and because by applying operator £ it is not possible to increase mutual information (von Neumann entropy, 
property 10) S{P ■.Q,M)> S (P' : Q', M') . Putting these results together 

S{P' ■.M')<S{P:Q). (2.7) 
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The last equation, with a little algebra is understood to be the Holevo bound. The one on the right of 
(2.7) can be found by thinking that p^'^ = Y^^Px \x) {x\ ® p^, hence S (P) = H (p^) , S {Q) = S {p) , and 
S {P, Q) = H (px) + J2xPxS (Px) i^on Neumann entropy, property 6), thus 

S{P:Q) = S{P)+S{Q)~S{P,Q) = S{p)-Y,P-S{p,). 

X 

Now the left hand side of (2.7) is found by noting that after a measurement 

pP'Q'M' = \x) {x\ ® v^p,/^® \y} {y\ , 

xy 

tracing out the system Q' and using the observation that the joint distribution p {x, y) for the pair (X,^ Y) 
satisfies p{x,y) = PxP{y\x) = Pxtr:{pxEy) = Pxtt:{^/E^Px^/E^) > it is straightforward to see that p^ ^ = 
T,xyPi^' y) 1^) (^1 ® \y) (yl ' whence 

S {P' :M') = S {P') + S (M') - S {P', M') = H{X) + H{Y)-H {X, Y) = H {X -.Y) , 

q.e.d. 



2.2 Data processing 

As it is widely known, information except of being stored and transmitted, it is also processed. In subsection 
2.2.1 data processing is defined for the classical case and the homonymous inequality is proven. An analogous 
definition and inequality are demonstrated for the quantum case in subsection 2.2.2. 

2.2.1 Classical data processing 

Classical data processing can be described in mathematical terms by a chain of random variables 

Xi X2 — * X3 ^ ■ ■ ■ ^ Xn (2-8) 

where Xi is the i-th step of processing. Of course each step depends only from the information gained by 
the previous, that is p [Xn+i — Xn+i\Xn — Xn, ■ ■ ■ ,Xi) — p (X„_|_i = Xn+i\Xn = Xn) , which defines a Markov 
chain. But as it is already accentuated information is a physical entity which can be distorted by noise. Thus 
if X ^ y ^ Z is a Markov chain representing an information process one can prove 

H {X) > H {X :Y) > H {X : Z) , (2.9) 

which is known as the data processing inequality. This inequality reveals a mathematical insight of the following 
physical truth: if a system described by a random variable X is subjected to noise, producing Y, then further 
data process cannot be used to increase the amount of mutual information between the output process and the 
original information X; once information is lost, it cannot be restored. It is worth mentioning that in a data 
process chain X — > y — > information a system Z shares with X must be information which Z also shares 
with Y; the information is 'pipelined' from X through Y to Z. This is described by the data pipelining inequality 

H{Z ■.Y)>H{Z -.X). 

This is derived by (2.9) and noting that X — > y — > Z is a Markov chain, if and only if 

N N p (X = x,Y = y, Z ~ z) p(Y = y,Z = z) 

p^Z = z Y = y,X = x)=p{Z = z Y = y) 4^ . ' = . ' ^ 

p{X = x,Y = y) p{Y = y) 

p (X = x,Y = y, Z = z) p(X=x,Y = y) „ . . 

^ / ^ = ^ ^p{X = xY = y,Z = z)=p{X = xY = y), 

P(Y = y,Z = z) p{Y = y) 

if and only il, Z ^ Y ^ X is a Markov chain too. In the above proof there is no problem with null probabilities 
in the denominators, because then every probability would be null, and the proven result would still hold. 
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2.2.2 Quantum data processing 

The quantum analogue of data processing (2.8) is described by a chain of quantum operations 

E\ (p) ^ (^2 o 5i) (p) ^ > (f„ o • • • o ^2 o ^i) (p) • 

In the above model each step of process is obtained by application of a quantum operator. By defining the 
coherent information, 

I{p,£)^S{£{p))-S{p,£). (2.10) 

It can be proven that 

5(p)>/(p,£:i)>/(p,£:iof2), (2.11) 

which corresponds to the classical data processing inequality (2.9). 
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Chapter 3 

Real-life applications of information 
theory 



After a theoretical presentation of information theory one should look over its real-life applications. Two ex- 
tremely useful applications of information theory are data compression discussed in section 3.1 and transmission 
of information over noisy channels is the main topic of section 3.2. 



3.1 Data compression 

Nowadays data compression is a widely applied procedure; everybody uses .zip archives, listen to .mp3 music, 
watches videos in .mpcg format and exchanges photos in .jpg files. Although in all these cases, special techniques 
depending on the type of data are used, the general philosophy underlying data compression is inherited by 
Shannon's noiseless channel coding theorem [7, 8] discussed in subsection 3.1.1. In the quantum case data 
compression was theoretically found to be possible in 1995 by Schumacher [27] and is presented in subsection 
3.1.2. 



3.1.1 Shannon's noiseless channel coding theorem 

Shannon's main idea was to estimate the physical resources needed to represent an information content, which 
as already seen in section 1.1 are related to entropy. As a simple example to understand how the theorem works, 
consider a binary alphabet to be compressed. In this alphabet assume that occurs with probability 1 — p and 1 
with probability p. Then if n-bits strings are formed, they will contain (1 — p)n zero bits and pn one bits, with 
very high probability related to the magnitude of n, according to the law of large numbers. Since from all the 
possible strings to be formed these are the most likely they are usually named typical sequences. Calculating 
all the combinations of ones and zeros, there are totally (^) such strings. Using Stirling's approximation 
n! ~ (f )" ) one finds that 



np 



n(l-p) 



{np)l [n {1 ~ p)]l ^np^np ^ n(i-p) y 
i2n\ogn—np\ognp—n{l—p)\ogn{l—p) 

M"^(n(l-p)r('-P^ ~ 
_ 2"[-pi°sp-(i-p)'°s(i-p)l = 2"^^P\ 

Generalizing the above argument for an alphabet of k letters Xi € X with occurrence probabilities p{xi) , the 
number of typical sequences can be easily calculated using combinatorics as before, and found to be 

( " \= ^2"^W. (3.2) 

\np{x-i),np{x2).,... ,np{xk-i)J H {np{x))\ 

xex\{xk} 

Where letters need not just be one symbol but also a sequence of symbols like words of English language. 
Obviously the probability of such a sequence to occur is approximately 2""^^'^). This approximate probability 
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gives a very intuitive way of understanding the mathematical terminology of Shannon's noiseless theorem, where 
a sequence y = Xi^ Xi^ Xi^ ■ ■ ■ Xi^ is defined to be e- typical if the probability of its occurrence is 

2-n(H(x)+£) <p(jj)< 2-»(ff(^)-«)_ 
The set of all such sequences is denoted T (n, e) . Another very useful way of writing this result is 



-\og(- ^ ]-H{X) 



< e. (3.3) 



Now the theorem of typical sequences can be stated 
Theorem of typical sequences: 

1. Fix e > 0, then for any 5 > 0, for sufficiently large n, the probability that a sequence is e-typical is at 

least 1 — (5. 

2. For any fixed e > o and 5 > 0, for sufficiently large n, (1 - 5) 2"(^(^)-'=) < |T (n, e)| < 2"(^(^)+<^). 

3. Let S in) be the collection of size at most 2"^, of length n sequences, where R < H {X) is fixed. Then 
for any S > and for sufficiently large n, X]i,es(n) P iv) — ^■ 

The above theorem is proven by using the law of large numbers. Moreover Shannon's noiseless channel 
coding theorem, is just an application of the last stated theorem. Shannon implemented a compression scheme 
which is just a map of an n-bit sequence y = Xi-^Xi^Xi^ ■ ■ ■ Xi^ to another one of ni?-bits denoted by C„ (y) . 
Of course in such a compression scheme an invert map I?„ (I?„ o C„ = idx^) should exist, which naturally 
would be named decompression scheme. However the set of typical sequences, in non-trivial cases, is only a 
subset of all the possible sequences and this drives to failure of the schemes, when they will be invited to 
map the complementary subset, known as atypical sequences subset. This way further nomenclature may be 
added by saying that a compression decompression scheme (Cn, Dn) is said to be reliable if the probability that 
Dn {Cn {y)) = y approaches one as n approaches infinity. It is time to state the theorem. 

Shannon's noiseless channel coding theorem: 

Assume an alphabet X then ii R > H {X) is chosen there exists a reliable compression decom- 
pression scheme of rate R. Conversely, if R < H {X) any such scheme will not be reliable. 

This theorem is revealing a remarkable operational interpretation for the entropy rate H (X): it is just the 
minimal physical resources necessary and sufficient to reliably transmit data. Finally it should be stressed that 
somebody can have a perfectly reliable compression decompression scheme just by extending maps to atypical 
sequences; in this case there is just high probability that no more than nR resources are needed to carry the 
information. 



3.1.2 Schumacher's quantum noiseless channel coding theorem 



It is yet quite surprising that quantum information can be compressed as was proven by Schumacher 3.1.2. 
Assume that quantum information transmitted can be in states \xi) G iJ®" with probability p{xi). This 
is described by the density matrix p = "^i^iP (xi) \xi) {xi \ . A compression-decompression scheme of rate R 
consists of two quantum operations C„ and I?„ analogous to the maps defined for the classical case. The 
compression operation C„ is taking states from if®" to iJ®"^ and the decompression !)„ returns them back, 
as Figure 3.1 demonstrates. One can define a sequence Xt^Xi^Xi^ ■ ■ ■ Xi^ as e-typical by a relation resembling to 
the classical (3.3) 



n (p ) p (.Xj2 )---p [x^^ ) 



S{p) 



< e. 



A state jxii) jxij) • • • \xij) is said to be e-typical if the 

subspace will be noted T (n, e) and the projector onto this subspace will be 

P{n,e)= ^ \xi,) {xi,\ 1^ \xi^) {xi^\ 1^ ■ ■ 



is e-typical. The e-typical 



Now the quantum typical sequences theorem can be stated. 
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Typical subspace theorem: 

1. Fix e > then for any S > 0, for sufficiently large n, tr(P (n, e) p®") > 1 — 6. 

2. For any fixed e > o and 5 > 0, for sufficiently large n, (1 - 6) 2"(^(^)-«) < \T{n,e)\ < 2"(^(^)+«). 

3. Let S (n) be a projector onto any subspace of iJ®" of dimension at most 2"^, where R < S (p) is fixed. 
Then for any S > and for sufficiently large n, tr(6' (n) p®") < S. 

Following the same principles the quantum version of Shannon's theorem as proved by Schumacher is, 

Schumacher's noiseless channel coding theorem: 

Let p be information belonging in some a Hilbert space H then if i? > 5* (p) there exists a reliable 
compression scheme. Conversely ii R < S (p) any compression scheme is not reliable. 





c„ 






P" 


p 




p' 





n log d nS (p) n log d 
qubits qubits qubits 



Figure 3.1: Quantum data compression. The compression operation C„ compresses a quantum source p stored 
in nlogd qubits into nS (p) qubits. The source is accurately recovered via the decompression operation 

The compression scheme found by Schumacher is 

C„ (a) ^ P (n, e) aP (n, e) + |0) (i| a \i) (0| , 

where \i) is an orthonormal basis for the ortho complement of the typical subspace, and |0) is some standard 
state. As one can sec this quantum operation takes any state a from H®"^ to H^"^, the subspace of e- typical 
sequences if a can be compressed, and if not it gives as outcome the standard state |0) , which is meant to be 
a failure. Finally was found to be the identity map on H^"-^, which obviously maps any compressed state 
back to /f®"^ < iJ®". 

3.2 Information over noisy channels 

It is an everyday life fact that communication channels arc imperfect and are always subject to noise which 
distorts transmitted information. This of course prevents reliable communication without some special control 
of information transmitted and received. One can use error correction in order to achieve such a control, 
which is summarized in subsection 3.2.1. However there arc some general theoretical results concerning such 
transmissions, which help calculate the capacity of noisy channels. The cases of classical information over noisy 
classical and quantum channels each presented in subsections 3.2.2 and 3.2.3. A a summary for the up today 
results for quantum information over noisy quantum channels is given in subsection 3.2.4. 

3.2.1 Error correction 

Error correction is a practical procedure for transmission of information over noisy channels. The intuitive idea 
behind it is common in every day life. As an example recall a typical telephone conversation. If the connection 
is of low quality, people communicating often need to repeat their words, in order to protect their talk against 
the noise. Moreover sometimes during a telephonic communication, one is asked to spell a word. Then by saying 
words whose initials are the letters to be spelled, misunderstanding is minimized. If someone wants to spell the 
word "phone", he can say "Parents", "Hotel", "Oracle", "None" and "Evangelist". If instead of saying "None", 
he said "New" the person at the other side of the line could possibly hear "Mew". This example demonstrates 
why words should be carefully selected. One can see that the last two words differ by only one letter, their 
Hamming distance is small (refer to appendix B.l for a definition), hence they should select words with higher 
distances. 
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The error correction procedure, as presented in the last paragraph, is the encoding and transmission of a 
message longer than the one willing to communicate, containing enough information to reconstruct the initial 
message up to a probability. If one wish to encode a fc-bit word x, into an n-bit y {k < n), then this encoding- 
decoding scheme is named [n, k] , and represented by function C (x) = y. In the last case it is often written 
X € C. One can avoid misunderstandings similar to "New" and "Mew", as found in the above paragraph, by 
asking each codeword to be of Hamming distance greater than d. Then after receiving a codeword y, he tries to 
find to which Hamming sphere, Sph(x, d) , it belongs, and then identifies the received codeword with the center 
of the sphere: x. Such a code is denoted by [n, k, d] . 

The basic notions of classical and quantum error correction are summarized in next paragraphs. 

Classical error correction 

From all the possible error correction codes, a subset is going to be presented here, the linear ones. A member 
of this subset, namely a [n, k] linear code, is modeled by an n x Z matrix G, often called the generator. The 
/c-bit message x, is treated as a column vector, and the encoded n-bit message is the Gx, where the numbers in 
both G and x are numbers of Z2, that is zeros and ones, and all the operations are performed modulo 2. 

The linear codes are used because for the case of [n. A;] , nk bits are needed to represent it. In a general code 
C, an n-bit string would correspond to one of 2'^ words, and to do this a table of n2'^ bits is needed. In contrast 
by using linear codes much memory is saved the encoding program is more efficient. 

In order to perform error correction one takes an (n — fc) x n matrix H, named parity check, having the 
property HG = 0. Then for every codeword y = Gx it is obvious that Hy = 0. Now if an noise was present 
during transmission, one receives a state y' = y + e, where e is the error occurred. Hence Hy' = Hy + He = He. 
Usually Hy' is called the error syndrome. From the error syndrome one can identify the initial yi{t>d{y',y) = 
d (y + e,y) = d (e, 0) , and then checking in which sphere y' G Sph(?/, d) . To do this the distance of the code 
must be defined hy d = d{C) — min d{x,y) , that is the spheres of radius d must be distinct. Then if 

d>2t + l,Vi\)iot bits can be corrected. All this are under the assumption that the probability that the channel 
flips a bit is less than ^. 

It is easy to check that linear codes [n, k, rf] , must satisfy the Singleton bound 

n-k>d-l. (3.4) 

One can further prove, that for large n there exists an [n, k] error-correcting code, protecting against t bits for 
some k, such that 

->1-H^in(-). (3.5) 
n \n J 

This is known as Gilbert- Varshamov bound. 

Some further definitions are needed. Suppose an [n, k] code C is given, then its dual is denoted C""-, and has 
as generator matrix H^ and parity check G^ . Thus the words in C"*" are orthogonal to C. A code is said to be 
weakly self-dual if C C C""-, and strictly self dual if C = C""-. 

Quantum error correction 

In what concerns quantum information theory, errors occurring are not of the same nature as in the classical 
case. One has to deal, except from bit flip errors, with phase flip errors. The first codes found to be able to both 
of them are named Calderbank-Shor-Steane after their inventors [28, 29]. Assume Ci and C2 are [n,ki] and 
[n, ^2] classical linear codes such that Ci C C2 and both Ci and C2 can correct t errors. Then an [n, ki — ^2] 
quantum code CSS(Ci,C2) is defined, capable of correcting errors on t qubits, named the CSS code of Gi 
over C2, via the following construction. Suppose x G Gi, then define |a; -|- C2) = / Xli/pr k + 2/) ; where 

V|C'2| S/t>-'2 

the addition is modulo 2. If now x' is an element of C\ such that x — x' G C2, then it easy to verify that 
\x + C2) = \x' + C2) , and hence \x + C2) depends only upon the coset G1/C2 9 x. Furthermore, if x and x' 
belong to different coset of C2, then for no y, y' G C2 does x y = x' -\- y' , and therefore {x -h G2\x' -\- C2) = 0. 
The quantum code CSS(Ci, C2) is defined to be the vector space spanned by the states \x -\- C2) for all x &G\. 
The number of cosets of G2 in Ci is so dim (CSS (Ci, C2)) = = 2'=l-'=^ and therefore CSS(Ci,C2) is 
an [n, fci — ^2] quantum code. 

The quantum error correction can be exploited by the classical error correcting properties of Ci and C^. 
In the quantum case there is like in the classical a possibility of a fiip bit error, given in ei, but additionally 
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a phase flip error in denoted here by 62- Then because of noise the original state \x + C2) |0) has changed 
to ^ X)yGC2 ( — 1)'''^'''^''''^^ \x + y + ei) |0) , where an ancilla system |0) to store the syndrome for the Ci is 

introduced. Applying the parity matrix to the deformed state ^ '}2yeC2 (— l)*-^^^'' '^^ \x + y + ei) \Hiei) , and 
by measuring the ancilla system the error syndrome is obtained and one corrects the error by applying NOT gates 
to the flipped bits as indicated by ei, giving -^j=y Syec2 (—1)^'^'''''^''^^ \x + y) . If to this state Hadamard gates 



to each qubit are applied, phase flip error are detected. One then takes 'l2yeC2 (— l)^^'*'^-''^'^^'''^'' \z) , 

and defining z' = z + 62, one can write the last state as J X)j,eC2 (—l)*-^^^-''^ \z' + 62) . Assuming 

z' e then it easy to verify that EyGCs (-1)^ ''' = IC2I , while if z' i then EyeC^ (-1)^ ''' = 0- Hence the 

state is further rewritten as ^ -^^^ S^'ec^ ^ 1-^' + ^2) , which resembles to a bit flip error described by 

vector 62. Following the same procedure for the bit flips the state is finally as desired quantum error-corrected. 

A quantum analogue of Gilbert- Varshamov bound is proven for the CSS codes, guaranteeing the existence 
of good quantum codes. In the limit n becomes large, an [n. A;] quantum code protecting up to t errors exist for 
some k such that ^ > 1 - 2i?bin (7^) • 

Concluding this summary on error correction, it is useful for quantum cryptography to define codes by 

\x + C2) ^ Yl (-l)"'" 1^ + ' (3.6) 



parametrized by u and v, and named CSS„,^ (Ci, C2) , which are equivalent to CSS(Ci, C2) . 
3.2.2 ClEissical information over noisy classical channels 

The theor{rtic;aI study of transmission of information over noisy channels is motivated by Shannon's correspond- 
ing theorem, which demonstrates the existence of codes capable of realizing it, without giving clues how they 
could be constructed. To model transmission of information over noisy channel a finite input alphabet X and 
a finite output alphabet Y are considered; if a letter x X is transmitted by one side, over the noisy channel, 
then a letter y G Y is received by the other, with probability p {y\x) , where of course J2yP{y\^) = 1 ^• 
The channel will be assumed memoryless in the sense of Markov's process, where the action on the currently 
transmitted letter is independent of the previous one. 

Now the process of transmitting information according to Shannon's noisy channel coding theorem uses the 
result of the noiseless one. According to this theorem it is always possible to pick up a reliable compression 
decompression scheme of rate R. Then a message M can be viewed as one of the possible 2"^ typical strings and 
encoded using the map C„ : {l, . . . ,2"^} X" which assigns M to each n-sequence of the input alphabet. 
This sequence is sent over the noisy channel and decoded using the map £>„ : F" {l, • • • ,2"^} . This 
procedure is shown in Figure 3.2. It is very natural for a given encoding-decoding pair to define the probability 
of error as the maximum probability over all messages M that the decoded output of the channel is not equal 
to M, 

p (C„, Dn) ^ max p {D„ {Y) ^ M\X = C„ (M)) . 

M 

Then it is said that a rate R is achievable if there exists such a sequence of encoding-decoding pairs {Cn,Dn) 
and require in addition p[C„,Dn) approaching zero as n approaches infinite. The capacity C [M) of a noisy 
channel M is defined as the supremum over all the achievable rates for the channel and is going to be calculated 
in the following paragraph. 

For the calculation, random coding will be used, that is 2"^^^^^ strings will be chosen from the possible 
input strings, which with high probability will belong in the set of typical ones. If these strings are sent over the 
channel a message belonging in the set Y" will be received. But for each received letter Y there is an ignorance 
on knowing X given by H iY\X) . Hence for each letter, 2^^^!'^) bits could have been sent, which means that 
totally there are 2"^'^!^^ possible sent messages. In order to achieve a reliable decoding the string received 
must be close to the 2"^^"^^ initially chosen strings. Then decoding can be modeled by drawing a Hamming 
sphere of radius 5 around the received message, containing 2"[^(^I'^)+''I possible input strings. In case exactly 
one input string belongs in this sphere then the encoding-decoding scheme will work reliably. It is unlike that 
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Figure 3.2: The noisy channel coding problem for classical messages. It is required that every one of the 2"^ 
possible messages should be sent uncorrupted through the channel with high probability. 

no word will be contained in the Hamming sphere, but it should be checked whether other input strings are 
contained in it. Each decoding sphere contains a fraction 



2n[H{Y\X)+S] 



2-n[H(Y:X)-S] 



of the typical input strings. If there arc 2"^ strings, where R can be related to Gilbert- Varshamov bound in 
equation 3.5, the probability that one falls in the decoding sphere by accident is 

2nR2-n[H(Y:X)-&] _ 2-'^[^(^ 

Since 5 can be chosen arbitrarily small, R can be chosen to be as close to H {Y : X) as desired. Now getting 
the maximum over the prior probabilities of the strings Shannon's result is found. 

Shannon's noisy channel coding theorem: 

For a noisy channel N the capacity is given by 

C {Af) = max H{Y:X), 

p{x) 

where the maximum is taken over all input distributions p{x) (a priori distributions) for X, for 
one use of the channel, and Y is the corresponding induced random variable at the output of the 
channel. 

It should be noted that the capacity found in the above mentioned theorem is the maximum one can get for 
the noisy channel Af. 



3.2.3 Classical information over noisy quantum channels 

The case of sending classical information over noisy quantum channels is quite similar to the classical channel. 

Each message is selected out of the 2"^, chosen by random coding as was done for the classical case. Suppose 
now a message M, is about to be sent and the i-th letter letter, denoted by Mj G {1,2,... , fc} , is encoded in 
the quantum states {pi, P2, - ■ ■ ,PkS of potential inputs of a noisy channel represented by a quantum operation 
£. Then the message M sent is written as a tensor product pj^^_^ ® pi^^^ ® ■ ■ ■ ® Pm„- Because of noise, the 
channel has some impact to the transmitted states, such that the output states are a Mi = £ (PMi) ) thus the 
total impact on the message M will be denoted ctm = (Pm) ■ The receiver must decode the ctm message 
with a similar way to the one for the noisy classical channel. Now because the channel is quantum, a set of 
POVM measurements is going to describe the outcome of information on the part of the receiver. To be more 
specific for every M message a POVM operator Em is going to be corresponded. The probability of successfully 
decoding the message, will be tv{aMEM) , and therefore the probability of an error being made for the message 
M is = l-tr(aM-EM) • 

The average probability of making an error while choosing from one of the 2"^ messages is 



A ^mPm 
~ 2nR 



J2m [1 ~ {(JmEm)] 

2nR 



(3.7) 
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Now the POVM operators Em can be constructed as follows. Let e > 0, and assume pj is a probability 
distribution over the indices {1,2,... ,k} of the letters, named the a priori distribution, then for the space of 
output alphabet a matrix density can be defined, a = J2jPj^j^ ^^'^ P be a projector onto the e-typical 
subspace of ct*^". By the theorem of quantum typical sequences, it follows that for any S > and for sufficiently 
large n, tr(a®" (/ — P)) < 6. For a given message M the notion of e-typical subspace for ctm can be defined, 
based on the idea that typically ctm is a tensor of about npi copies of pi, np2 copies of P2 and so on. Define 
S = J2i Pj^ i^j) ■ Suppose cFj has a spectral decomposition 



so (TM = EA^ \E^) {E^\ , where 

K 



K={Ki,... ,Kn),a,nd for convenience = A^^^ A™^' • • • A^f; and \Ef^) = 
Defining finally the projector Pm onto the space spanned by all such that 



1, 1 



< e. 



are defined. 



(3.8) 



Moreover the law of large numbers imply that for any 6 > and for sufficiently large n, E [tr (ctmPm)] > 1 — 

where the expectation is taken with respect to the distribution over the strings p^,^, hence E [tr (ctm {I — Pm))] 
6. Also note that by the definition (3.8) the dimension of the subspace onto which Pm projects can be at most 
2"(^+'), and thus E [tr (Pm)] < 2"(^+^). Now the POVM operators are defined 



E 



M 



^ ( 5^PPm' P PPmP i $^PPm' P 



(3.9) 



, M' 



, M' 



To explain intuitively this construction, up to small corrections Em is equal to the projector Pm and the 
measurements {Em} correspond essentially to checking whether the output of the channel falls into the subspace 
on which Pm projects. This can be though as analogous to the Hamming sphere around the output. Using (3.7) 
and (2.9) one can find out that E [pav] < 46 + (2"-« - l) 2-"[^('^)-^-2£] _ provided R < S (a) - S it follows that 
E \pav] approaches zero as n approaches infinity. These where the main steps to prove the following theorem. 

Holevo-Schumacher- Westmoreland (HSW) theorem: 

Let £ be a trace-preserving quantum operation. Define 



X(^) 



max 



(3.10) 



where the maximum is over all ensembles {pj,pj} of possible input states pj to the channel. Then 
X {£) is the product state capacity for then channel £, that is, x i^) = d^^"^ i^) ■ 

In the aforementioned theorem the symbol (7*^^^ {£) is used to denote the capacity of the channel, but just 
in the case of a product case. Whether this kind of capacity might be exceeded if the input states are prepared 
in entangled states is not known and it is one of the many interesting open questions of quantum information 
theory. It should be emphasized that like in the case of a classical channel, the capacity found here is the 
maximum one can get for the noisy channel £. 

Finally for the maximization in equation (3.10) is potentially over an unbounded set, therefore for practical 
reasons one takes the maximum over and ensemble of pure states (refer to [2, p. 212-214] for more details). 



3.2.4 Quantum information over noisy quantum channels 

Unfortunately up to day there is no complete understanding of quantum channel capacity. As far as it concerns 

the present state of knowledge, the most important results were already presented in subsections 2.1.2 and 2.2.2, 
concerning each the accessible quantum information and quantum data processing. One should also mention 
that there exist a quantum analogue to equation 3.4, the quantum Singleton bound, which is n — A; > 2 (d — 1) 
for an [n, k, d] quantum error correcting code. 

However an additional comment should be mentioned. The coherent information defined in (2.10), because 
of its role in quantum data processing (equation (2.11) compared with (2.9)), it is believed to be the quantum 
analogue of mutual information H {X : Y) , and hence perhaps related to quantum channel capacity. This 
intuitive argument is yet unproven. For some progress to that hypothesis, see [1, p. 606] and the references 
therein. 
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Chapter 4 

Practical quantum cryptography 



One of the most interesting applications of quantum information theory and the only one realized until now, 
is quantum cryptography. For an overview of the history and the methods of classical cryptography, and for a 
simple introduction in quantum cryptography refer to [30]. 

It is widely known that there exist many secure cryptographic systems, like for example the RSA [31]. 
Then why quantum cryptography is needed? The reason is that as long quantum laws hold, it is theoretically 
unbreakable. In addition to this all known classical cryptographic systems, like the RSA, seem to be broken by 
quantum computers [1, 30], using quantum factoring and quantum discrete logarithms [20, 21], or by methods 
found in quantum search algorithms [22, 23]. 

In this chapter the basic notions of quantum cryptography and a proof of its security arc analyzed in section 
4.1. Then the possibility of constructing a commercial device capable of performing quantum cryptography is 
discussed in section 4.2. 

4.1 Theoretical principles of quantum cryptography 

Cryptography usually concerns two parties A and B, willing to securely communicate, and possibly an eaves- 
dropper E. These points are sometimes given human names for simplicity, calling A : Alice, B : Bob and E : 
Eve. The situation is visualized in Figure 4.1. 
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communicated 
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Figure 4.1: Alice and Bob communicating with the fear of Eve stealing information. 

Quantum mechanics can be used to secure communication between Alice and Bob. To achieve this a 
quantum line will be used to send a randomly produced cryptographic key (see Figure 4.3). This can be done 
using protocols described in subsection 4.1.1. Moreover Alice and Bob need a classical line to discuss their 
result, as described by the same protocol, and send the encrypted message. 

The encryption and the decryption of the message is done using a string K, the key, which should be of equal 
length to the message, M. Then by applying the modulo 2 addition © for each bit of the strings, the encrypted 
message is E = M (B K. Finally the message is decrypted by further addition of the key, (M (B K) (B K = 
M®{K®K) = M®0 = M. 
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Figure 4.2: The setup of BB84 quantum key distribution protocol. The quantum hne is used for the distribution 
of the key, while the classical line for discussing security topics and the transmission of encrypted messages. 



4.1.1 The BB84 quantum key distribution protocol 

Looking back in Figure 1.2, presented in page 7, one can see from the basic features of quantum mechanics that 

it is not easy for Eve to steal information, since it cannot be copied by no-cloning theorem. Moreover, it is 
impossible for her to distinguish non-orthogonal states, and any information gain related to such a distinction 
involves a disturbance. Then Alice and Bob by sacrificing some piece of information, and checking through 
the coincidence of their key can verify whether Eve was listening to them. Motivated by the above arguments 
Bennet and Brassard presented in 1984 the first quantum cryptographic protocol [32] described next. 

Suppose Alice has an n-bit message. Then in order to generate an equal length cryptographic key she begins 
with some a and b (4 + 5) n-bit strings randomly produced. She must then encode this to a (4 + 6) n-qubit 
string 



(4+<5)r 



fe=l 



where ak the k-th bit of a (and similarly for b), and each qubit just mentioned can be 



iV-oo) = |0). 
IV-io) = 



(4.1) 



IV-oi) = l+) = 



IV-ii) 



I-) 



|0) + |i) 
|o)-|i) 

^/2 ■ 



where |+) is produced by application of the Hadamard gate on |0) , and |— ) by applying the same gate on 
|1) . The above procedure, encodes a, in the basis {|0) , |1)} if 6fe = 0, or in {|+) , |— )} if &fe = 1- The states of 
each basis are not orthogonal to the ones of the other basis, and hence they cannot be distinguished, without 
distortion. S is used as a tolerance due to noise of the channel. 

The pure state \ip) {ip\ is sent over the quantum channel, and Bob receives £ (\ip) {ipD . He announces the 
receipt in the public channel. At this stage Alice, Bob and Eve have each their own states, described by their 
own density matrices. Moreover Alice has not revealed b during the transmission, and Eve has difficulty in 
identifying a by measurement, since she does not know in which basis she should do it, and hence any trial to 
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measure disturbs £ ("01) • However £ {\'<p) ("01) 7^ ("01 in general, because the channel can be noisy. The 
above description implies that a and h should be completely random, because any ability of Eve to infer anything 
about these strings, would jeopardize the security of the protocol. Note that quantum mechanics offers a way 
of perfect random number generation, just by applying on |0) state, the Hadamard gate, one obtains ^^^^5 

and after a measurement either |0) or |1) are returned with probability p= \- 

In order to find their key Bob proceeds in measuring £ {tp\) in the basis determined by a string b' he 
generated randomly too. This way he measures the qubits and finds some string a'. After the end of his 
measurements he asks Alice, on the public channel, to inform him about h. They then discard all bits am and 
a'^ having hm 7^ b'^^ since they measured them in different bases and hence are uncorrelated. It should be 
stressed that the public announcement of h does not help Eve to infer anything about a or a'. 

At this point they both have new keys and of statistically approximate length 2n, and Alice selects 
randomly n-bits and informs Bob publicly about their values. If they find that a number of qubits above a 
threshold t, disagree then they stop the procedure and retry. In case of many failures there is definitely an 
invader and they should locate him. In case of success, there are some approximately n bit strings a'^ and Og, 
not communicated in public. Then if for example Alice wants to send a message M, she encodes it, taking 
E = M(Ba'^, and send E through the public channel. Then Bob decodes it by M' = E(Ba'^. The current keys, 
strings a'^ and a^, should be discarded and not reused. 

How can Alice's key be the same as Bob's u'q in the case of a noisy channel which results £ {\tp) {tp\) ^ 
\ip) {ip\? This is the main topic of the next subsection. 



4.1.2 Information reconciliation and privacy amplification 

As anyone can assume, the communication over the noisy quantum channel is imperfect, £ {\tp) {tp]) ^ \tp) {tp], 
and hence even if there was no interference by Eve in general 7^ • Alice and Bob should perform an error 
correction to get the same key a* , by discussing over the public channel and revealing some string g related to 

and a^. This is named information reconciliation. In order to have a secure key they should have privacy 
amplification by removing some bits of a* to get a", minimizing Eve's knowledge, since a\ are related to string 
g publicly communicated. It is known that both procedures can be used with high security. 

As already seen information reconciliation is nothing more than error-correction; it turns out that privacy 
amplification it related to error-correction too, and both tasks are implemented using classical codes. To be 
more specific decoding from a randomly chosen CSS code, already presented in subsection 3.2.1, can be thought 
of as performing information reconciliation and privacy amplification. This can be seen by considering their 
classical properties. Consider two classical linear codes Ci and C2 which satisfy the conditions for a t bit error- 
correcting [n, m] CSS code. To perform the subsequent task the channel should sometimes randomly tested and 
seen to have errors less than t, including eavesdropping. 

Alice should pick randomly the codes Ci and C2. Assume that a'j^ = a'^ + e, where e is some error. Since 
it is known that less than t errors occurred, if Alice and Bob both correct their states to the nearest codeword 
in Ci, their results will be a*. This step is information reconciliation. To reduce Eve's knowledge Alice and 
Bob identify which of the 2™ cosets of C2 in Ci their state a* belongs to. This is done by computing the 
cosct a* + C2 in Ci. The result is their m bit key sting a". By virtue of Eve's knowledge about C2, and the 
error-correcting properties of C2, this procedure can reduce Eve's mutual information with a" to an acceptable 
level, performing privacy amplification. 



4.1.3 Privacy and security 

In order to quantify bounds in quantum cryptography, two important notions are defined in this subsection: 
privacy and security. 

Assume Alice sends the quantum states p^, k = 0,1, 2,... , and Bob receives = £ (p^) . The mutual 
information between the result of any measurement Bob may do and Alice's value, H (B : A) , is bounded above 
by Holevo bound (2.6), thus H {B : A) < x^, and similarly Eve's mutual information is bounded above by 
H {E : A) < x^- Since any excess information Bob has relative to Eve, at least above a certain threshold can in 
principle be exploited by Bob and Alice to distill a shared secret key using the techniques of the last subsection. 
It makes sense to define privacy as 

V = sup [H{B:A)-H{E: A)] , 

5 
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and if Eve initially had an 



where <S are all the possible strategies Alice and Bob may employ to the channel. This is the maximum excess 
classical information that Bob may obtain relative to Eve about Alice's quantum signal. Using the HSW theorem 
(3.10), Alice and Bob may employ a strategy such that H (B : A) = , while for any strategy Eve may employ, 
H [E : A) < ■ Thus V > x^ — x^ . A lower bound may be obtained by assuming that Alice's signal states 

are pure (refer to discussion after equation (3.10) to see why), = ^Vfc 

unentanglcd state |0-^) , which may also assumed pure. Suppose Eve's interaction is '0"^^^ = U |0"^) , 

since it is a pure state, the reduced density matrices p^ and p^ will have the same non-zero eigenvalues, and thus 
the same entropies, S (pf ) = S (pf ) . Thus P > - X = ^ (p^) - T.kPkS (pf ) - S (p^) + Y.kPkS (pf ) = 
S (p^) — S (p^) = I {p,£) , where the definition (2.10) was used. Note that the lower bound for privacy is 
protocol independent. 

A quantum key distribution (QKD) protocol is defined secure if, for any security parameters s > and 
I > chosen by Alice and Bob, and for any eavesdropping strategy, either the scheme aborts, or succeeds with 
probability at least 1 — O {2~^) , and guarantees that Eve's mutual information with the final key is less than 
2~K The key string must also be essentially random. 

4.1.4 A provable secure version of the BB84 protocol 

It can be proven [1, p. 593-599] that using CSS codes one can have a 100% secure quantum key distribution. 
However CSS codes need perfect quantum computation, which is not yet achieved. Fortunately there is a chance 
of using the classical properties of CSSu,.,, codes defined in (3.6) to have an equally secure classical computation 
version of BB84 protocol (refer to [1, p. 599-602] for a proof). This version is made up of the following steps: 

1. Alice creates {A + 5)n random bits. 

2. For each bit, she creates a qubit either in the {]0) , |1)} or in {1+) , |— )} basis, according to a random bit 
string b, see for example (4.1). 

3. Alice sends the resulting qubits to Bob. 

4. Alice chooses a random Vk S Ci . 

5. Bob receives the qubits, publicly announces this fact, and measures each in the {|0) , ]1)} or in {]+) , |— )} 
basis randomly. 

6. Alice announces b. 

7. Alice and Bob discard those bits Bob measured in a basis other than b. With high probability, there are 
at least 2n bits left; if not, abort the protocol. Alice decides randomly on a set of 2n bits to continue to 
use, randomly selects n of these to check bits, and announces the selection. 

8. Alice and Bob publicly compare their check bits. If more than t of these disagree, they abort the protocol. 
Alice is left with the n bit string x, and Bob with x + e. 

9. Alice announces x — Vk- Bob subtracts this from his result, correcting it with code Ci to obtain Vk- 
10. Alice and Bob compute the coset of Vk + C2 in Ci to obtain the key k. 



4.2 A commercial quantum cryptographic device 

After a theoretical presentation of quantum key distribution it is time to present experimental results and then 
discuss the possibility of having a commercial device realizing quantum cryptography. These two topics are 
discussed correspondingly in subsections 4.2.1 and 4.2.2. 
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4.2.1 Experimental quantum key distribution 



The first demonstration of qnantnm key distribution was performed at the IBM laboratory in 1989 [33] over 
a distance of 30 cm. Since then there has been remarkable improvement, demonstrating quantum key distri- 
bution over a distance of 10 km [34, 35], in IBM too, or over distances exceeding 40km, and also in installed 
telecommunication fiber, under the Lake Geneva [36]. 

In most cases experimental quantum cryptography is done using single photons, and optical fibers are used 
to guide them from Alice to Bob. Once the medium is chosen one should pick the right source and detectors. 
Since they have to be compatible, the crucial choice is the wave length. There are two main possibilities. Either 
one chooses a wavelength around 800 nm where efficient photon counters are commercially available, or one 
chooses a wavelength compatible with today's telecommunication optical fibers, that is near 1300 nm or 1550 
nm. The first choice requires the use of special fibers, hence the installed telecommunications networks can't 
be used. The second choice requires the improvement or development of new detectors not based on silicon 
semiconductors which are transparent above 1000 nm wavelength. It is still unclear which of the two alternatives 
will turn out to be the best choice. 

In what concerns the production of photons according to the BB84 states, defined in equation (4.1), one can 
choose different polarization states, which form non-orthogonal bases. Polarization can be for example linear, 
identifying ]0) = ]t) and ]1) = ]J,) , or circular identifying )+) = [0) and )— ) = )0) . However in practice single 
photon states are difficult to realize thus approximately single photon states are produced by faint laser pulses. 

Finally one should detect these approximate single photon states. This is achieved using a variety of tech- 
niques. One can choose between photomultipliers, avalanche-photodiodes, multichannel plates and supracon- 
ducting Josephson junctions. 

A typical experimental setup for quantum key distribution, with the technology described above is sawn in 
Figure 4.3. 

Alice Bob 



Basis 1 




Figure 4.3: Typical system for quantum cryptography using polarization coding (LD: laser diode, BS: beamsplit- 
ter, F: neutral density filter, PBS: polarizing beam splitter, A/2: half waveplate, APD: avalanche photodiode). 

For more experimental details on the single photon quantum key distribution one should refer to [4, p. 11-29]. 
4.2.2 Commercial quantum cryptography 

Once the experimental setup of quantum key distribution has been performed, as discussed in subsection 4.2.1 
and as sawn in Figures 4.3 and 4.2, one needs to follow the steps presented in subsection 4.1.4, in order to 
achieve perfectly secure quantum cryptography. Of course these steps are nothing but an algorithm, which can 
be analyzed and easily implemented into a program which can run on current technology computers. Moreover 
special controlling devices are needed in order to instruct the laser diodes, the beam splitters and get the result 
of the measurements from the avalanche photodiodes. Such hardware is already available in the market. One 
should not forget that a public connection is needed, like for example the internet. For this reason there exist 
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patents [37], which can convert quantum cryptography from a laboratory experiment into a commercial product. 
Such a patent is discussed bellow. 

One can reconstruct the steps of subsection 4.1.4 into the following computer program: 

1. Alice's computer creates {4 + S)n random bits, using an unknown to anybody else algorithm. (One can 
also use quantum techniques as discussed in the third paragraph of subsection 4.1.1) 

2. For each bit, Alice's computer triggers with a controller one of the four laser diodes, as shown in Figure 
4.2, according to a random bit string b. 

3. This way the light beams are sent to Bob's site into an agreed beforehand rate. 

4. Alice's computer has already implemented the classical version of CSS codes, and then it picks up randomly 
a Ufe e Ci. 

5. Bob's computer instructs with a controller the beam splitter sawn in Figure 4.2 in which basis to measure 
the light beam. The selection should be according to a random algorithm. There should be a device 
returning the result of the measurement of the avalanche photodiodes to Bob's computer. Bob's computer 
announces the fact through an internet line to Alice's computer. 

6. Alice's computer announces b. 

7. Both computers discard those bits Bob's measured in a basis other than b. With high probability, there 
are at least 2n bits left; if not, abort the protocol. Alice's computer decides randomly on a set of 2n bits 
to continue to use, randomly selects n of these to check bits, and announces to Bob's computer through 
internet. 

8. Both computers compare through internet their check bits. If more than t of these disagree, they abort 
the protocol. Alice's computer is left with the n bit string x, and Bob's with x + e. 

9. Alice's computer announces x — Vk- Bob's computer subtracts this from his result, correcting it with code 

Ci to obtain Vk- 

10. Both computer calculate the coset of Vk + C2 in C\ to obtain the key k. 

Concluding it should be noted that what is very important about the above implementation, is that it is 
completely automatic and almost no human intervention is needed. Thus users can just write their messages, 
command the computers to send them securely and in the other side receive them. Automatic process is what 
makes such a device successfully commercial. 



32 



Summary 



The most important tools and results of classical and quantum information theory, obtained in the present 
Individual Study Option, are summarized in the Figure S.l. 



Information Theory 



Classical 
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Von Neumann entropy 
S{p) = -tr(plogp) 
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Letters always distinguishable 
N = \X\ 



Holcvo bound 
H{X:Y)<S{p)-J:P.S{p,) 

X 



Information-theoretic relations 



Faiio inequality 
H(p,)+p,\og{\X\-l)>H{X\Y) 

Mutual information 
H {X -.Y) = H (Y) - H (Y\X) 
Data processing inequality 
X -^Y ^ Z 
H{X)> H{X ■.Y)> H{X : Z) 



Quantum Fano inequality 
H {F {p, E)) + (1 - F (p, E)) log (d2 - 1) 
>S{p,E) 
Coherent information 
I{p,E)=S{E{p))-Sip,E) 
Quantum data processing inequality 
P^Si (p) {£20 £i) (fi) 
S{p)> I {p,Si)> I {p, £20 £i) 



Noiseless channel coding 



Shannon's theorem 
ribits = H{X) 



Schumacher's theorem 

nqubits = S (J2xPxPx) 



Capacity of noisy channels for classical information 



Shannon's noisy coding theorem 



C (Af) = max H{Y :X) 

p(x) 



Holevo-Schumacher-Westmoreland 
theorem 

C«(£:)= max \s {p') - JZp^S {p'J 

P'x=^ (Px) . P' = T,Pxp'x 



Figure S.l: Summary of classical and quantum information theory. 
The most important results concerning quantum cryptography, are summarized in Figure (S.2). 
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Quantum Cryptography 

BB84 protocol, sends the following states 
see (4.1) for definitions 
Privacy of a quantum channel £ 

v>i{p,e) 



Figure S.2: Summary of quantum cryptography. 
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Appendix A 

Omitted proofs 



A.l Special diagonal transformation of normal matrices 

In this appendix it is going to be demonstrated that for a d x d normal matrix A, there exists a set of unitary 
matrices U^'^\ such that J^Li^l"^^ ^U^'^^'' =tr{A)I. Since A is a normal matrix, by spectral decomposition 
theorem there exists a matrix J7*^"^^ such that t/^"*-* AC/^"*'^ is diagonal, and for this diagonal matrix there exists 
a unitary transformation Vi^jAV^^j which interchanges the i-th diagonal element with the j-th. 
To see this suppose B is a. d x d dimensional diagonal matrix, then the following matrix 

Ski, k or I ^ i,j 
1 , k = j and / = i 
1, k = i and I = j ' 
0, else 



can be used to exchange the i-th diagonal element with the j-th of B. Following the next steps Vi^j is initially 
proven to be unitary 



1=1 



In 



5Z (^<-»i)n; 



1=1 



/ 6ki, kovl^ i,j \ 



E 



V 



1, k = j and I = i 

1, k = i and I = j 
0, else 

f Skn, k or ri / i,j ^ 
1, k = j and n = j 
1, k — i and n = i 
0, else 



/ 6ni, nor i,j \ 
1, n = j and I = i 

1, n = i and / = j 
0, else 



— Skn = I- 



The ability of Vi^j to exchange the diagonal elements of matrix B = diag {. . . ,bi, . . . ,bj, . . .} is exhibited by 

d d d 

1=1 m=l ™" 1=1 ' 

/ Ski, k or I i,j \ ( Sni,norl^i,j \ 



nl 



E 



1 , k = j and I = i 
1, k — i and I = j 
0, else 



/ b„Skn, k^i,j or ny^i,j \ 
hi, k = j and n = j 
bj, k = i and n = i 
\ 0, else / 



1, n = j and I = i 

1, n = i and / = j 
\ 0, else 



diag{. . . ,bj, . 



Defining now V23...di — Viv^2V2v^3 • • • Vd-i^dVd^i, which is unitary matrix as a multiplication of unitary 
matrices, then V23...diU^'^^ AU^'^^^V23,,,^-^ ^ diagonal matrix where the in the first diagonal place is the second 
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diagonal clement of A, in the second diagonal place the third diagonal element of A, and so on until the 
final diagonal place where the first diagonal element of A stands. The next display visualizes the similarity 
transformation V23...di 



an 



a22 



a{d-2){d-2) 



add 



022 



ass 



a{d-i){d-i) 



add 



an 



Enumerating all the cyclic permutations of (1,2,... , d) with a number i then the following unitary matrices 
are defined f//^^ = ViU^^\ It is straightforward to verify that, 

dii + a22 + • • • + add 

022 + oss + • • • + an 

add + an + ■ ■ ■ + a(^d-i){d-i) 

tr (A) I. 

A. 2 Projective measurements are equivalent to an addition of uni- 
tary operations 

Let P be a projector and Q = I—P the complementary projector, in this appendix it will be sawn that there exist 

unitary matrices Ui , U2 and a probability p such that for all p, PpP + QpQ = pUipU} + (1 — p) U2pU\ . Choose 
p A 1 [/^ A Q_p and U2 = Q + P = I. It is obvious that UiUl = {Q-P){Q-P) = QQ + QP-PQ-+PP = 
Q + + P = I, and U2U2 =11 = 1. Now it is easy to check that 

\u^pUl + \u2pUl = i(Q-P)p{Q-P) + ^[Q + P)p{Q + P) 

= \ {QpQ - QpP - PpQ + PpP) + \ {QpQ + QpP + PpQ + PpP) 

= PpP + QpQ. 
q.e.d. (Abbas Edalat contributed to this proof) 



Y^UI^AUI^^ = 



i=l 
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Appendix B 

Distance measures of information 



B.l Classical information distance: Hamming 

The Hamming distance of two strings, is defined to be the number of places their bits are different. Assuming 
a and b are the two n-bit strings, and at and 6, are their i-th bits, one can write 

n 

d{a,b) 4 ^ |aj - bi\ . 

i=l 

Very naturally, a Hamming sphere of center c and radius 6, is defined as the set of stings which have distance 
from c less or equal to i5 

Sph(c,(5) = {s : d{c,s) < 5}. 

B.2 Quantum information distance: Fidelity 

Fidelity is a measure of distance of two quantum states, defined by 



F{p,a) = ti J p2apl = max \{ip\(l)}\ , 

and used to determine how well a quantum channel preserves information. To do this the following function 
can be defined 

m 

where the quantum channel is simulated via the quantum operation £, and the minimization is considered as 
the worst case of a quantum signal. Another interesting definition is 

3 

named the ensemble average fidelity. Finally it is important to quantify, how much entanglement between R 
and Q, sent through a quantum channel £ (a trace preserving operation), is preserved. This can be done by 

entanglement fidelity 

F (p, £)^F {RQ, R'Q'f = {RQ\ [(J^ ® £) {\RQ) {RQ\)] \RQ) = |tr {pEi)\^ , (B.l) 

3 

where the primes are for the states after the application of £, and Ei are the operation elements of £. 
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